I’m an AI Researcher in Audio, NLP, and Music

I am a last-year PhD candidate in the Sound and Music Computing Lab, School of Computing, National University of Singapore, advised by Prof. Ye Wang (王晔). My research focuses on advancing music information retrieval (MIR), audio and speech processing, and natural language processing (NLP) through deep learning methods. I am particularly interested in the automatic transcription, generation, and translation of music and lyrics, with a strong focus on self-supervised learning, transfer learning, and controlled generation for real-world applications. Prior to joining NUS, I earned my Bachelor's degree in Computer Science with honors from the Harbin Institute of Technology and completed my bachelor's thesis on piano music transcription at the Auditory Intelligence Research Center, advised by Prof. Jiqing Han (韩纪庆).

Moreover, I am a professional-level violinist and a self-taught guitarist. Please visit the "As Musician" page to explore my music portfolio. I feel incredibly fortunate to have discovered my passion for music and to have the opportunity to work and conduct research in a music-related field. My love for music energizes and motivates me to continually grow and excel in this area.

Email · Google Scholar · LinkedIn · Full CV · Twitter · WeChat

Publications

(* indicates equal contribution)

Audio Content Analysis

Lead Instrument Detection from Multitrack Music
Longshen Ou, Yu Takahashi, and Ye Wang
2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
[ Code + Model + Dataset | Presentation | ArXiv ]
Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription
Longshen Ou*, Xiangming Gu*, and Ye Wang
Proceedings of the 23rd International Society for Music Information Retrieval Conf. (ISMIR 2022)
[ Code | ArXiv ]
Exploring Transformer’s Potential on Automatic Piano Transcription
Longshen Ou, Ziyi Guo, Emmanouil Benetos, Jiqing Han, and Ye Wang
2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022)

Symbolic Music Generation

PhraseVAE and PhraseLDM: Latent Diffusion for Full-Song Multitrack Symbolic Music Generation
Longshen Ou, Ye Wang
Technical Report
[ Demo | Code | ArXiv ]
Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
Longshen Ou, Jingwei Zhao, Ziyu Wang, Gus Xia, Qihao Liang, Torin Hopkins, and Ye Wang
The 39th Annual Conference on Neural Information Processing Systems (NeurIPS 2025)
[ Demo | Code | REMI-z MIDI Toolkit | Slides | ArXiv ]

Lyric Generation (Text & Symbolic Music)

Songs Across Borders: Singable and Controllable Neural Lyric Translation
Longshen Ou, Xichu Ma, Min-Yen Kan, Ye Wang
The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
[ Demo | Code | ISMIR 2023 Late Breaking Demo | ArXiv ]
Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation
Longshen Ou, Xichu Ma, and Ye Wang
Journal of New Music Research 2023 submission

Multimodal Content Understanding

Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing
Xiangming Gu, Longshen Ou, Wei Zeng, Jianan Zhang, Nicholas Wong, Ye Wang
ACM Transactions on Multimedia Computing, Communications and Applications (TOMM 2024)
[ Code | ArXiv ]
MM-ALT: A Multimodal Automatic Lyric Transcription System (Oral, Top Paper Award)
Xiangming Gu*, Longshen Ou*, Danielle Ong, and Ye Wang
Proceedings of the 30th ACM International Conference on Multimedia (ACM Multimedia 2022)
[ Demo | Code | Dataset | Press | ArXiv ]

Applied Sequence Modeling: DNA

DNA Storage Toolkit: A Modular End-to-End DNA Data Storage Codec and Simulator
Puru Sharma, Gary Goh, Bin Gao, Longshen Ou, Dehui Lin, Deepak Sharma, Djordje Jevdjic
2024 IEEE Int’l Symposium on Performance Analysis of Systems and Software (ISPASS 2024)
[ Slides ]

Internships

Sea AI Lab (SAIL), Singapore

Associate Member (Research Collaboration) · Mar 2025 – Jun 2025
Mentor: Dr. Longxu Dou, Dr. Bin Wang

I am integrating better audio understanding ability to LLMs.

Huawei Singapore Research Center, Singapore

Algorithm Engineer Intern · Mar 2025 – Jun 2025
Mentor: Dr. Hao Jia

I am working on Text-to-Speech related projects.

Sony Computer Science Laboratories (Sony CSL)

Research Intern · Aug 2024 – Nov 2024
Mentor: Dr. Taketo Akama

Designed a tokenization scheme for Whisper to represent melody note sequences, enabling efficient transcription of musical elements.
Fine-tuned Whisper for lyric and melody transcription using frame-level and end-to-end approaches, achieving performance comparable to state-of-the-art models.
Implemented semi-supervised techniques, including auxiliary transposition-equivariant loss, pseudo-labeling, noisy student model, and FixMatch, to enhance transcription performance.
Applied diverse data augmentation strategies (time shift, time stretch, SpecAugment, and pitch shift) to improve the model’s ability to learn music-specific concepts and handle noisy labels.

Yamaha Corporation, MixerAI Team, Hamamatsu

Research Intern · May 2024 – Aug 2024
Mentor: Dr. Yu Takahashi

First work on lead instrument detection from multitrack music, providing annotated datasets and benchmark models. Work published in ICASSP 2025.

Harvard University, Data Systems Laboratory

Research Intern · Aug 2020 – Sep 2020
Mentor: Prof. Stratos Idreos

Contributed to the research More or Less: When and How to Build Convolutional Neural Network Ensembles, accepted at ICLR 2021.

Tencent, CDG Department, Shenzhen

Back End Developer · Jul 2020 – Aug 2020
Mentor: Luo Yunlong

Enhanced a distributed online analytical processing system for user engagement data analysis.
Accelerated query processing speed by 120x through a distributed storage caching strategy.
Skills: Docker, Kubernetes, OLAP, Hadoop, Spark, Presto, Tencent Cloud, Go

Other Projects

REMI-z Tokenizer and MultiTrack music data structure
This tool helps to convert your music between MIDI and REMI-z representation, which is an efficient sequence representation of multitrack music, meanwhile facilitate manipulate the music at bar level.
[ github | PyPI ]
Guitar Arranger
This tool generates solo guitar arrangements from any MIDI song. It models both left-hand fingering and right-hand picking patterns. The system consists of two main modules: Voicer, which finds economical fretboard positions to play melody and harmony simultaneously across all musical blocks; and Arpeggiator, which generates picking patterns that simulate the rhythmic feel of the original piece.
Unofficial MuseCoco
This is a Hugging Face Transformers implementation of MuseCoco’s attribute-to-music generation model, originally built on Fairseq, which has not been actively maintained for years. The old version tightly coupled the model, configuration, task, and data pipeline, making it difficult to modify, while also re-generating all prefixes at inference, causing unnecessary slowdowns. This revamped version, based on Hugging Face GPT-2, is much more flexible, allowing for easier customization and fine-tuning while significantly improving inference speed by continuing generation from a given prefix instead of recomputing everything. A faster, more modular, and user-friendly implementation.
GuitarFret
With this guitar fretboard simulator on your laptop, never worry about composing without a guitar around you!
DNA Storage Simulation
DNA-based storage systems present unique challenges, as reading and writing operations can sometimes result in alterations to the original information. To model the changes introduced by such storage systems in a wet lab environment, we designed a simulation system to emulate DNA behavioral changes. This system includes a rule-based method, a Multi-Layer Perceptron (MLP) method, and a sequence-to-sequence attention-based Recurrent Neural Network (RNN). The experiments based on the Microsoft Nanopore dataset shows the sequence-to-sequence method is highly effective.
GNN-based Music Recommender
This project aims to tackle the music artist recommendation challenge using Graph Convolutional Networks (GCNs). By modeling artist and user identities through their interactive relationships, the network predicts affinity scores between users and previously unexplored artists to generate personalized recommendations. I implemented the original GCN as a baseline and proposed three enhancements: incorporating edge weight for aggregation, augmenting edge weight with attention mechanisms, and implementing data augmentation by introducing noise to edge values.

Honors and Awards

SoC Research Incentive Award, issued by School of Computing, NUS, 2023.10.
Research Achievement Award (2022/2023), issued by School of Computing, NUS, 2023.5.
Top Paper Award (2% of accepted full papers), issued by ACM Multimedia 2022, 2022.11.
Honor Degree of Bachelor of Engineering, issued by Harbin Institute of Technology Honors School, 2021.6.
People Scholarship (6%), issued by Harbin Institute of Technology, 2020.6.
Third Prize, Sogou Innovative Practice Project for College Student, 2018.10.

Invited Talks

AI in Music: From Understanding to Creation, invited talk at Huawei Hisilicon, 2025.08.27.
[ Slides ]
My Research in Music AI: Audio, Lyrics, and Symbolic Generation, invited talk at Sony CSL, 2024.11.21.

Teaching

Teaching Assistant, CS4347/5647 Sound and Music Computing (2022/2023 sem 1, 2023/2024 sem 1).
Teaching Assistant, CS4248 Natural Language Processing (2022/2023 sem 2).

Academic Reviewers

NeurIPS 2025
IEEE TAFFC (2024)
EAI ArtsIT 2024
ACM TOMM (2024, 2025)
ACM Multimedia 2023, 2024
ACL Rolling Review (2024)
TASLP (2024, 2025)
ISMIR 2022, 2023