Paper
12 January 2023 Transcription of Mandarin singing based on human-computer interaction
Wanglong Ren, Zebin Huang, Xiangjian Zeng, Zhen Liu
Author Affiliations +
Proceedings Volume 12509, Third International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI 2022); 125091Y (2023) https://doi.org/10.1117/12.2657477
Event: Third International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI 2022), 2022, Guangzhou, China
Abstract
Lyric transcription is similar to speech recognition, both identify content from sound clips. Speech recognition technology is maturing and related application systems have been widely used in the software industry, but the research on singing content is far from getting enough attention, there is still little research on identifying words and sentences from singing voice. What's more serious is that compared with the lyrics transcription in the English field, there are almost no related academic papers in the Mandarin field. On the one hand, speech recognition has high-quality datasets in multiple languages that are large enough to train large-scale models. However, the field of singing lacks data resources. On the other hand, compared with speech recognition, singing recognition has obvious skills in pronunciation, which is embodied in musical characteristics such as pitch and rhythm. Based on these problems, this paper aims to provide a dataset that can be used for Mandarin lyrics transcription, and build a transcription model on this dataset. Our model can address some deficiencies of the existing models, and achieves promising results on our dataset.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Wanglong Ren, Zebin Huang, Xiangjian Zeng, and Zhen Liu "Transcription of Mandarin singing based on human-computer interaction", Proc. SPIE 12509, Third International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI 2022), 125091Y (12 January 2023); https://doi.org/10.1117/12.2657477
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Transformers

Convolution

Data modeling

Speech recognition

Signal processing

Artificial intelligence

Human-computer interaction

Back to Top