Full Publication List


2026

  • J. Zhao, W. Zeng, T. Lyu, and Y. Wang, “CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis with Structured Melody Control and Guidance,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), early access, 2026, doi: 10.1109/TASLPRO.2026.3664643.

    © 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • T. Lyu, J. Zhao, and Y. Wang, “KSDIFF: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation,” in Proceedings of the 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026). 2026.

    © 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • H. Liu, Z. Cui, X. Gu, and Y. Wang, “Unlocking Large Audio-Language Models for Interactive Language Learning,” in Findings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026). 2026.

2025

2024

2023

2022

2021

2020

  • H. Huang, F. Xue, H. Wang, and Y. Wang, “Deep Graph Random Process for Relational-Thinking-Based Speech Recognition,” in Proceedings of the 37th International Conference on Machine Learning (ICML 2020). PMLR, 2020, pp. 4531-4541. [supplementary] [slides]

  • W. Wei, H. Zhu, E. Benetos, and Y. Wang, “A-CRNN: A Domain Adaptation Model for Sound Event Detection,” in Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020). IEEE, 2020, pp. 276-280.

    © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • B. Sharma and Y. Wang, “Automatic Evaluation of Song Intelligibility using Singing Adapted STOI and Vocal-specific Features,” IEEE ACM Trans. Audio Speech Lang. Process. (TASLP), vol. 28, pp. 319-331, 2020. [code] [data]

    © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • C. Gupta, H. Li, and Y. Wang, “Automatic Leaderboard: Evaluation of Singing Quality without a Standard Reference,” IEEE ACM Trans. Audio Speech Lang. Process. (TASLP), vol. 28, pp. 13-26, 2020.

    © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

2019

  • B. Anderson, M. Shi, V. Y. F. Tan, and Y. Wang, “Mobile Gait Analysis Using Foot-Mounted UWB Sensors,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 3, no. 3, pp. 73:1-73:22, 2019.

  • S. S. R. Phaye, E. Benetos, and Y. Wang, “SubSpectralNet - Using Sub-Spectrogram Based Convolutional Neural Networks for Acoustic Scene Classification,” in Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019). IEEE, 2019, pp. 825-829.

    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • B. Sharma, C. Gupta, H. Li, and Y. Wang, “Automatic Lyrics-to-Audio Alignment on Polyphonic Music Using Singing-Adapted Acoustic Models,” in Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019). IEEE, 2019, pp. 396-400.

    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • Wang, Y, “Singing Voice Modelling for Language Learning,” Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019

2018

2017

  • C. Gupta, H. Li, and Y. Wang, “Perceptual Evaluation of Singing Quality,” in Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2017). IEEE, 2017, pp. 577-586. (Best Student Paper Award)

    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • D. Turnbull, C. Gupta, D. Murad, M. Barone, and Y. Wang, “Using Music Technology to Motivate Foreign Language Learning,” in Proceedings of the 2017 International Conference on Orange Technologies (ICOT 2017). IEEE, 2017, pp. 218-221.

    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • D. Murad, F. Ye, M. Barone, and Y. Wang, “Motion Initiated Music Ensemble with Sensors for Motor Rehabilitation,” in Proceedings of the 2017 International Conference on Orange Technologies (ICOT 2017). IEEE, 2017, pp. 87-90.

    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • J. Fang, D. Grunberg, S. Lui, and Y. Wang, “Development of a Music Recommendation System for Motivating Exercise,” in Proceedings of the 2017 International Conference on Orange Technologies (ICOT 2017). IEEE, 2017, pp. 83-86.

    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • J. Fang, D. Grunberg, D. J. Litman, and Y. Wang, “Discourse Analysis of Lyric and Lyric-Based Classification of Music,” in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). 2017, pp. 464-471.

  • C. Gupta, D. Grunberg, P. Rao, and Y. Wang, “Towards Automatic Mispronunciation Detection in Singing,” in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). 2017, pp. 390-396..

  • K. M. Ibrahim, D. Grunberg, K. Agres, C. Gupta, and Y. Wang, “Intelligibility of Sung Lyrics: A Pilot Study,” in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017). 2017, pp. 686-693. [data]

  • Z. Duan, C. Gupta, G. Percival, D. Grunberg, and Y. Wang, “SECCIMA: Singing and Ear Training for Children with Cochlear Implants via a Mobile Application,” in Proceedings of the 14th Sound and Music Computing Conference (SMC 2017). 2017, pp. 200-207.

2016

2015

2014

2013

  • Y. Yu, R. Zimmermann, Y. Wang, and V. Oria, “Scalable Content-Based Music Retrieval Using Chord Progression Histogram and Tree-Structure LSH,” IEEE Trans. Multim. (TMM), vol. 15, no. 8, pp. 1969-1981, 2013.

    © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • Z. Duan, H. Fang, B. Li, K. C. Sim, and Y. Wang, “The NUS Sung and Spoken Lyrics Corpus: A Quantitative Comparison of Singing and Speech,” in Proceedings of the 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2013). IEEE, 2013, pp. 1-9. [data]

    © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • Z. Li, B. Zhang, Y. Yu, J. Shen, and Y. Wang, “Query-Document-Dependent Fusion: A Case Study of Multimodal Music Retrieval,” IEEE Trans. Multim. (TMM), vol. 15, no. 8, pp. 1830-1842, 2013.

    © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • Z. Cai, R. J. Ellis, Z. Duan, H. Lu, and Y. Wang, “Basic Evaluation of Auditory Temporal Stability (BEATS): A Novel Rationale and Implementation,” in Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR 2013). 2013, pp. 541-546.

  • Z. Li, J. Wang, J. Cai, Z. Duan, H. Wang, and Y. Wang, “Non-Reference Audio Quality Assessment for Online Live Music Recordings,” in Proceedings of the 21st ACM International Conference on Multimedia (MM 2013). ACM, 2013, pp. 63-72.

2012

2011

2010

2009

2008

2007

2006

  • D. Iskandar, Y. Wang, M. Kan, and H. Li, “Syllabic Level Automatic Synchronization of Music Signals and Text Lyrics,” in Proceedings of the 14th ACM International Conference on Multimedia (MM 2006). ACM, 2006, pp. 659–662.

  • A. Loscos, Y. Wang, and W. J. J. Boo, “Low Level Descriptors for Automatic Violin Transcription,” in Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR 2006). 2006, pp. 164–167.

  • W. J. J. Boo, Y. Wang, and A. Loscos, “A Violin Music Transcriber for Personalized Learning,” in Proceedings of the 2006 IEEE International Conference on Multimedia and Expo (ICME 2006). IEEE Computer Society, 2006, pp. 2081–2084.

    © 2006 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • W. Huang and Y. Wang, “Efficient Partial Spectrum Reconstruction Using an Asymmetric PQMF Algorithm for MPEG-Coded Stereo Audio,” in Proceedings of the 2006 IEEE International Conference on Multimedia and Expo (ICME 2006). IEEE Computer Society, 2006, pp. 901–904.

    © 2006 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • J. Korhonen, Y. Huang, and Y. Wang, “Generic Forward Error Correction of Short Frames for IP Streaming Applications,” Multim. Tools Appl., vol. 29, no. 3, pp. 305–323, 2006.

2005

  • A. Shenoy and Y. Wang, “Key, Chord, and Rhythm Tracking of Popular Music Recordings,” Comput. Music. J., vol. 29, no. 3, pp. 75–86, 2005.

  • A. Shenoy, Y. Wu, and Y. Wang, “Singing Voice Detection for Karaoke Application,” in Proceedings of Visual Communications and Image Processing 2005 (VCIP 2005). SPIE, 2005, pp. 752–762.

  • J. Yin, T. Sim, Y. Wang, and A. Shenoy, “Music Transcription Using an Instrument Model,” in Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). IEEE, 2005, pp. 217–220.

    © 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • W. Huang and Y. Wang, “A Method for Separating Drum Objects from Polyphonic Musical Signals,” in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2005 (WASPAA 2005). IEEE, 2005, pp. 307–310.

    © 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • J. Yin, Y. Wang, and D. Hsu, “Digital Violin Tutor: An Integrated System for Beginning Violin Learners,” in Proceedings of the 13th ACM International Conference on Multimedia (MM 2005). ACM, 2005, pp. 976–985.

  • Y. Huang, S. Chakraborty, and Y. Wang, “Using Offline Bitstream Analysis for Power-Aware Video Decoding in Portable Devices,” in Proceedings of the 13th ACM International Conference on Multimedia (MM 2005). ACM, 2005, pp. 299–302.

  • J. Korhonen and Y. Wang, “Power-Efficient Streaming for Mobile Terminals,” in Proceedings of the Network and Operating System Support for Digital Audio and Video, 15th International Workshop (NOSSDAV 2005). ACM, 2005, pp. 39–44.

  • W. Huang, Y. Wang, and S. Chakraborty, “Power-aware Bandwidth and Stereo-image Scalable Audio Decoding,” in Proceedings of the 13th ACM International Conference on Multimedia (MM 2005). ACM, 2005, pp. 291–294.

  • S. Chakraborty, Y. Wang, and W. Huang, “A Perception-Aware Low-Power Software Audio Decoder for Portable Devices,” in Proceedings of the 2005 3rd Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia 2005). IEEE Computer Society, 2005, pp. 13–18.

  • J. Korhonen, Y. Wang, and D. Isherwood, “Toward Bandwidth-Efficient and Error-Robust Audio Streaming over Lossy Packet Networks,” Multim. Syst., vol. 10, no. 5, pp. 402–412, 2005.

  • Y. Huang, J. Korhonen, and Y. Wang, “Optimization of Source and Channel Coding for Voice Over IP,” in Proceedings of the 2005 IEEE International Conference on Multimedia and Expo (ICME 2005). IEEE Computer Society, 2005, pp. 173–176.

    © 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • J. Korhonen and Y. Wang, “Effect of Packet Size on Loss Rate and Delay in Wireless Links,” in Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC 2005). IEEE, 2005, pp. 1608–1613.

    © 2005 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

2004

  • T. L. Nwe, A. Shenoy, and Y. Wang, “Singing Voice Detection in Popular Music,” in Proceedings of the 12th ACM International Conference on Multimedia (MM 2004). ACM, 2004, pp. 324–327.

  • N. C. Maddage, C. Xu, and Y. Wang, “Singer Identification Based on Vocal and Instrumental Models,” in Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004). IEEE Computer Society, 2004, pp. 375–378.

    © 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • Y. Wang, M. Kan, T. L. Nwe, A. Shenoy, and J. Yin, “LyricAlly: Automatic Synchronization of Acoustic Musical Signals and Textual Lyrics,” in Proceedings of the 12th ACM International Conference on Multimedia (MM 2004). ACM, 2004, pp. 212–219. (Best Student Award)

  • A. Shenoy, R. Mohapatra, and Y. Wang, “Key Determination of Acoustic Musical Signals,” in Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME 2004). IEEE Computer Society, 2004, pp. 1771–1774.

    © 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • X. Shao, C. Xu, Y. Wang, and M. S. Kankanhalli, “Automatic Music Summarization in Compressed Domain,” in Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004). IEEE, 2004, pp. 261–264.

    © 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • T. L. Nwe and Y. Wang, “Automatic Detection of Vocal Segments in Popular Songs,” in Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR 2004). 2004.

  • J. Yin, A. Dhanik, D. Hsu, and Y. Wang, “The Creation of a Music-Driven Digital Violinist,” in Proceedings of the 12th ACM International Conference on Multimedia (MM 2004). ACM, 2004, pp. 476–479.

  • Y. Wang, W. Huang, and J. Korhonen, “A Framework for Robust and Scalable Audio Streaming,” in Proceedings of the 12th ACM International Conference on Multimedia (MM 2004). ACM, 2004, pp. 144–151.

2003

2002

  • Y. Wang and S. Streich, “A Drumbeat-Pattern Based Error Concealment Method for Music Streaming Applications,” in Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002). IEEE, 2002, pp. 2817–2820.

    © 2002 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

2001

© 2001 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

2000

  • Y. Wang, M. Vilermo, and L. Yaroslavsky, “Energy Compaction Property of the MDCT in Comparison with Other Transforms,” in Proceedings of the Audio Engineering Society Convention 109 (AESC 2000). 2000.

  • Y. Wang, L. Yaroslavsky, M. Vilermo, and M. Vaananen, “Some Peculiar Properties of the MDCT,” in Proceedings of the 5th International Conference on Signal Processing Proceedings, 16th World Computer Congress (WCC 2000), vol. 1. IEEE, 2000, pp. 61–64.

    © 2000 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • Y. Wang, L. Yaroslavsky, and M. Vilermo, “On the Relationship between MDCT, SDPT and DFT,” in Proceedings of the 5th International Conference on Signal Processing Proceedings, 16th World Computer Congress (WCC 2000), vol. 1. IEEE, 2000, pp. 44–47.

    © 2000 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting/republishing this material for advertising or promotional purposes, collecting new collected works for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

  • L. Yaroslavsky and Y. Wang, “DFT, DCT, MCDT, DST and Signal Fourier Spectrum Analysis,” in Proceedings of the 10th European Signal Processing Conference (EUSIPCO 2000). IEEE, 2000, pp. 1–4.

  • Y. Wang, M. Vilermo, M. V ̈a ̈an ̈anen, and L. Yaroslavsky, “Restructured Audio Encoder for Improved Computational Efficiency,” in Proceedings of the Audio Engineering Society Convention 108 (AESC 2000). 2000.

1999