[ News | Publications | Internships | Projects | Awards | Invited Talks | Teaching | Reviewer ]

I’m an AI Researcher in Audio, NLP, and Music

Image description

I am currently a proud PhD candidate (last year) in the Sound and Music Computing Lab, School of Computing, National University of Singapore, advised by Prof. Ye Wang (王晔). My research focuses on advancing music information retrieval (MIR), audio and speech processing, and natural language processing (NLP) through deep learning techniques. I am particularly interested in the automatic transcription, generation, and translation of music and lyrics, with a strong focus on self-supervised learning, transfer learning, and controlled generation for real-world applications. Prior to joining NUS, I earned my Bachelor's degree in Computer Science with honors from the Harbin Institute of Technology and completed my bachelor's thesis on piano music transcription at the Auditory Intelligence Research Center, advised by Prof. Jiqing Han (韩纪庆).

Moreover, I am a professional-level violinist and guitarist. Please visit the "As Musician" page to explore my music portfolio. I feel incredibly fortunate to have discovered my passion for music and to have the opportunity to work and conduct research in a music-related field. My love for music energizes and motivates me to continually grow and excel in this area.

Email · Google Scholar · LinkedIn · Full CV · Twitter · WeChat


Recent News

  • [2025.9] The paper Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization is accepted by NeurIPS 2025.
  • [2024.12] The paper Lead Instrument Detection from Multitrack Music is accepted by ICASSP 2025.
  • [2024.12] REMI-z tokenizer and MultiTrack music data structure is now available in PyPI. This is my first open source project on pip :)
  • [2024.8] I started my internship at Sony Computer Science Laboratories.
  • [2024.5] I started my internship at YAMAHA at Hamamatsu, Shizuoka, Japan.
  • [2024.3] The paper DNA Storage Toolkit: A Modular End-to-End DNA Data Storage Codec and Simulator is accepted by ISPASS 2024. Congratulations to Puru Sharma!
  • [2024.3] The paper Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing is accepted by ACM TOMM. Congratulations to my colleague Xiangming!
  • [2023.10] My short paper Singable and Controllable Neural Lyric Translation: a Late-Breaking Showcase is accepted by ISMIR 2023 Late Breaking Demo.
  • [2023.6] One full paper was rejected by ISMIR 2023. Sadge!
  • [2023.5] I passed the Qualification Exam. Now I am a PhD candidate!
  • [2023.5] My paper Songs Across Borders: Singable and Controllable Neural Lyric Translation is accepted by ACL 2023.
  • [2023.1] I receive Research Achievement Award (2022/2023) from School of Computing, NUS.
  • [2022.12] I'm attending ISMIR 2023 at Bengaluru, India.
  • [2022.11] Our ACM Multimedia paper receives the top paper award (2% of accepted full papers).
  • [2022.10] I'm attending ACM Multimedia at Lisbon, Portugal.
  • [2022.7] An extension work of our previous paper, Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription is acctepted by ISMIR 2023.
  • [2022.7] My paper collaborated with Xiangming Gu, MM-ALT: A multimodal automatic lyric transcription system is accepted by ACM Multimedia 2022.
  • [2022.5] I'm attending ICASSP 2022 at Singapore.
  • [2022.1] My first paper, which achieves another SOTA on piano music transcription, is accepted by ICASSP 2022.
  • [2022.1] I start my PhD journey in NUS SMCL, advised by Prof. Wang Ye.
  • [2021.8] I join National University of Singapore as a student in Master of Computing program (AI track), start my research in Sound and Music Computing Lab .


Publications

(* indicates equal contribution)

Audio Content Analysis

Controlled Generation (Text & Symbolic Music)

Multimodal Content Understanding

Other Sequence Transformation


Internships

Sea AI Lab (SAIL), Singapore

Associate Member (Research Collaboration) · Mar 2025 – Jun 2025
Mentor: Dr. Longxu Dou, Dr. Bin Wang

  • I am integrating better audio understanding ability to LLMs.

Huawei Singapore Research Center, Singapore

Algorithm Engineer Intern · Mar 2025 – Jun 2025
Mentor: Dr. Hao Jia

  • I am working on Text-to-Speech related projects.

Sony Computer Science Laboratories (Sony CSL)

Research Intern · Aug 2024 – Nov 2024
Mentor: Dr. Taketo Akama

  • Designed a tokenization scheme for Whisper to represent melody note sequences, enabling efficient transcription of musical elements.
  • Fine-tuned Whisper for lyric and melody transcription using frame-level and end-to-end approaches, achieving performance comparable to state-of-the-art models.
  • Implemented semi-supervised techniques, including auxiliary transposition-equivariant loss, pseudo-labeling, noisy student model, and FixMatch, to enhance transcription performance.
  • Applied diverse data augmentation strategies (time shift, time stretch, SpecAugment, and pitch shift) to improve the model’s ability to learn music-specific concepts and handle noisy labels.

Yamaha Corporation, MixerAI Team, Hamamatsu

Research Intern · May 2024 – Aug 2024
Mentor: Dr. Yu Takahashi

  • First work on lead instrument detection from multitrack music, providing annotated datasets and benchmark models. Work published in ICASSP 2025.

Harvard University, Data Systems Laboratory

Research Intern · Aug 2020 – Sep 2020
Mentor: Prof. Stratos Idreos

Tencent, CDG Department, Shenzhen

Back End Developer · Jul 2020 – Aug 2020
Mentor: Luo Yunlong

  • Enhanced a distributed online analytical processing system for user engagement data analysis.
  • Accelerated query processing speed by 120x through a distributed storage caching strategy.
  • Skills: Docker, Kubernetes, OLAP, Hadoop, Spark, Presto, Tencent Cloud, Go

Other Projects

  • REMI-z Tokenizer and MultiTrack music data structure
    This tool helps to convert your music between MIDI and REMI-z representation, which is an efficient sequence representation of multitrack music, meanwhile facilitate manipulate the music at bar level.
    [ github | PyPI ]

  • Unofficial MuseCoco
    This is a Hugging Face Transformers implementation of MuseCoco’s attribute-to-music generation model, originally built on Fairseq, which has not been actively maintained for years. The old version tightly coupled the model, configuration, task, and data pipeline, making it difficult to modify, while also re-generating all prefixes at inference, causing unnecessary slowdowns. This revamped version, based on Hugging Face GPT-2, is much more flexible, allowing for easier customization and fine-tuning while significantly improving inference speed by continuing generation from a given prefix instead of recomputing everything. A faster, more modular, and user-friendly implementation.

  • GuitarFret
    With this guitar fretboard simulator on your laptop, never worry about composing without a guitar around you!

  • DNA Storage Simulation
    DNA-based storage systems present unique challenges, as reading and writing operations can sometimes result in alterations to the original information. To model the changes introduced by such storage systems in a wet lab environment, we designed a simulation system to emulate DNA behavioral changes. This system includes a rule-based method, a Multi-Layer Perceptron (MLP) method, and a sequence-to-sequence attention-based Recurrent Neural Network (RNN). The experiments based on the Microsoft Nanopore dataset shows the sequence-to-sequence method is highly effective.

  • GNN-based Music Recommender
    This project aims to tackle the music artist recommendation challenge using Graph Convolutional Networks (GCNs). By modeling artist and user identities through their interactive relationships, the network predicts affinity scores between users and previously unexplored artists to generate personalized recommendations. I implemented the original GCN as a baseline and proposed three enhancements: incorporating edge weight for aggregation, augmenting edge weight with attention mechanisms, and implementing data augmentation by introducing noise to edge values.


Honors and Awards

  • SoC Research Incentive Award, issued by School of Computing, NUS, 2023.10.
  • Research Achievement Award (2022/2023), issued by School of Computing, NUS, 2023.5.
  • Top Paper Award (2% of accepted full papers), issued by ACM Multimedia 2022, 2022.11.
  • Honor Degree of Bachelor of Engineering, issued by Harbin Institute of Technology Honors School, 2021.6.
  • People Scholarship (6%), issued by Harbin Institute of Technology, 2020.6.
  • Third Prize, Sogou Innovative Practice Project for College Student, 2018.10.


Invited Talks

  • AI in Music: From Understanding to Creation, invited talk at Huawei Hisilicon, 2025.08.27.
    [ Slides ]
  • My Research in Music AI: Audio, Lyrics, and Symbolic Generation, invited talk at Sony CSL, 2024.11.21.


Teaching

  • Teaching Assistant, CS4347/5647 Sound and Music Computing (2022/2023 sem 1, 2023/2024 sem 1).
  • Teaching Assistant, CS4248 Natural Language Processing (2022/2023 sem 2).


Academic Reviewers

  • NeurIPS 2025
  • IEEE TAFFC (2024)
  • EAI ArtsIT 2024
  • ACM TOMM (2024, 2025)
  • ACM Multimedia 2023, 2024
  • ACL Rolling Review (2024)
  • TASLP (2024, 2025)
  • ISMIR 2022, 2023