[ News | Publications | Internships | Projects | Awards | Teaching | Reviewer ]

I’m an AI Researcher in NLP and Audio Processing.

Image description

I am currently a proud PhD candidate (last year) in the Sound and Music Computing Lab, School of Computing, National University of Singapore, advised by Prof. Ye Wang (王晔). My research focuses on advancing music information retrieval (MIR), audio and speech processing, and natural language processing (NLP) through deep learning techniques. I am particularly interested in the automatic transcription, generation, and translation of music and lyrics, with a strong focus on self-supervised learning, transfer learning, and controlled generation for real-world applications. Prior to joining NUS, I earned my Bachelor's degree in Computer Science with honors from the Harbin Institute of Technology and completed my bachelor's thesis on piano music transcription at the Auditory Intelligence Research Center, advised by Prof. Jiqing Han (韩纪庆).

Moreover, I am a professional-level violinist and guitarist. Please visit the "As Musician" page to explore my music portfolio. I feel incredibly fortunate to have discovered my passion for music and to have the opportunity to work and conduct research in a music-related field. My love for music energizes and motivates me to continually grow and excel in this area.

Email · Google Scholar · LinkedIn · Full CV · Twitter · WeChat


Recent News

  • [2024.12] The paper Lead Instrument Detection from Multitrack Music is accepted by ICASSP 2025.
  • [2024.12] REMI-z tokenizer and MultiTrack music data structure is now available in PyPI. This is my first open source project on pip :)
  • [2024.8] I started my internship at Sony Computer Science Laboratories.
  • [2024.5] I started my internship at YAMAHA at Hamamatsu, Shizuoka, Japan.
  • [2024.3] The paper DNA Storage Toolkit: A Modular End-to-End DNA Data Storage Codec and Simulator is accepted by ISPASS 2024. Congratulations to Puru Sharma!
  • [2024.3] The paper Automatic Lyric Transcription and Automatic Music Transcription from Multimodal Singing is accepted by ACM TOMM. Congratulations to my colleague Xiangming!
  • [2023.10] My short paper Singable and Controllable Neural Lyric Translation: a Late-Breaking Showcase is accepted by ISMIR 2023 Late Breaking Demo.
  • [2023.6] One full paper was rejected by ISMIR 2023. Sadge!
  • [2023.5] I passed the Qualification Exam. Now I am a PhD candidate!
  • [2023.5] My paper Songs Across Borders: Singable and Controllable Neural Lyric Translation is accepted by ACL 2023.
  • [2023.1] I receive Research Achievement Award (2022/2023) from School of Computing, NUS.
  • [2022.12] I'm attending ISMIR 2023 at Bengaluru, India.
  • [2022.11] Our ACM Multimedia paper receives the top paper award (2% of accepted full papers).
  • [2022.10] I'm attending ACM Multimedia at Lisbon, Portugal.
  • [2022.7] An extension work of our previous paper, Transfer Learning of wav2vec 2.0 for Automatic Lyric Transcription is acctepted by ISMIR 2023.
  • [2022.7] My paper collaborated with Xiangming Gu, MM-ALT: A multimodal automatic lyric transcription system is accepted by ACM Multimedia 2022.
  • [2022.5] I'm attending ICASSP 2022 at Singapore.
  • [2022.1] My first paper, which achieves another SOTA on piano music transcription, is accepted by ICASSP 2022.
  • [2022.1] I start my PhD journey in NUS SMCL, advised by Prof. Wang Ye.
  • [2021.8] I join National University of Singapore as a student in Master of Computing program (AI track), start my research in Sound and Music Computing Lab .


Publications

(* indicates equal contribution)

Audio Content Analysis

Controlled Generation (Text & Symbolic Music)

Multimodal Content Understanding

Other Sequence Transformation


Internships

Associate Member (Research Collaboration), Sea AI Lab (SAIL)

📍 Singapore · Mar 2025 – Ongoing
👨‍🏫 Mentor: Dr. Longxu Dou, Dr. Bin Wang

  • I am integrating better audio understanding ability to LLMs.

Algorithm Engineer Intern, Huawei Singapore Research Center

📍 Singapore · Mar 2025 – Ongoing
👨‍🏫 Mentor: Dr. Hao Jia

  • I am working on Text-to-Speech related projects.

Research Intern, Sony Computer Science Laboratories (Sony CSL)

📍 Tokyo, Japan · Aug 2024 – Nov 2024
👨‍🏫 Mentor: Dr. Taketo Akama

  • Designed a tokenization scheme for Whisper to represent melody note sequences, enabling efficient transcription of musical elements.
  • Fine-tuned Whisper for lyric and melody transcription using frame-level and end-to-end approaches, achieving performance comparable to state-of-the-art models.
  • Implemented semi-supervised techniques, including auxiliary transposition-equivariant loss, pseudo-labeling, noisy student model, and FixMatch, to enhance transcription performance.
  • Applied diverse data augmentation strategies (time shift, time stretch, SpecAugment, and pitch shift) to improve the model’s ability to learn music-specific concepts and handle noisy labels.

Research Intern, Yamaha Corporation, MixerAI Team

📍 Hamamatsu, Japan · May 2024 – Aug 2024
👨‍🏫 Mentor: Dr. Yu Takahashi

  • First work on lead instrument detection from multitrack music, providing annotated datasets and benchmark models. Work published in ICASSP 2025.

Research Intern, Harvard University, Data Systems Laboratory

📍 Remote · Aug 2020 – Sep 2020
👨‍🏫 Mentor: Prof. Stratos Idreos

Back End Developer, Tencent, CDG Department

📍 Shenzhen, China · July 2020 – Aug 2020
👨‍🏫 Mentor: Luo Yunlong

  • Enhanced a distributed online analytical processing system for user engagement data analysis.
  • Accelerated query processing speed by 120x through a distributed storage caching strategy.
  • Skills: Docker, Kubernetes, OLAP, Hadoop, Spark, Presto, Tencent Cloud, Go

Other Projects

  • REMI-z Tokenizer and MultiTrack music data structure
    This tool helps to convert your music between MIDI and REMI-z representation, which is an efficient sequence representation of multitrack music, meanwhile facilitate manipulate the music at bar level.
    [ github | PyPI ]

  • Unofficial MuseCoco
    This is a Hugging Face Transformers implementation of MuseCoco’s attribute-to-music generation model, originally built on Fairseq, which has not been actively maintained for years. The old version tightly coupled the model, configuration, task, and data pipeline, making it difficult to modify, while also re-generating all prefixes at inference, causing unnecessary slowdowns. This revamped version, based on Hugging Face GPT-2, is much more flexible, allowing for easier customization and fine-tuning while significantly improving inference speed by continuing generation from a given prefix instead of recomputing everything. A faster, more modular, and user-friendly implementation.

  • GuitarFret
    With this guitar fretboard simulator on your laptop, never worry about composing without a guitar around you!

  • DNA Storage Simulation
    DNA-based storage systems present unique challenges, as reading and writing operations can sometimes result in alterations to the original information. To model the changes introduced by such storage systems in a wet lab environment, we designed a simulation system to emulate DNA behavioral changes. This system includes a rule-based method, a Multi-Layer Perceptron (MLP) method, and a sequence-to-sequence attention-based Recurrent Neural Network (RNN). The experiments based on the Microsoft Nanopore dataset shows the sequence-to-sequence method is highly effective.

  • GNN-based Music Recommender
    This project aims to tackle the music artist recommendation challenge using Graph Convolutional Networks (GCNs). By modeling artist and user identities through their interactive relationships, the network predicts affinity scores between users and previously unexplored artists to generate personalized recommendations. I implemented the original GCN as a baseline and proposed three enhancements: incorporating edge weight for aggregation, augmenting edge weight with attention mechanisms, and implementing data augmentation by introducing noise to edge values.


Honors and Awards

  • SoC Research Incentive Award, issued by School of Computing, NUS, 2023.10.
  • Research Achievement Award (2022/2023), issued by School of Computing, NUS, 2023.5.
  • Top Paper Award (2% of accepted full papers), issued by ACM Multimedia 2022, 2022.11.
  • Honor Degree of Bachelor of Engineering, issued by Harbin Institute of Technology Honors School, 2021.6.
  • People Scholarship (6%), issued by Harbin Institute of Technology, 2020.6.
  • Third Prize, Sogou Innovative Practice Project for College Student, 2018.10.


Teaching

  • Teaching Assistant, CS4347/5647 Sound and Music Computing (2022/2023 sem 1, 2023/2024 sem 1).
  • Teaching Assistant, CS4248 Natural Language Processing (2022/2023 sem 2).


Academic Reviewers

  • IEEE TAFFC (2024)
  • EAI ArtsIT 2024
  • ACM TOMM (2024, 2025)
  • ACM Multimedia 2023, 2024
  • ACL Rolling Review (2024)
  • TASLP (2024, 2025)
  • ISMIR 2022, 2023