Paper
20 June 2023 End-to-end speech recognition with reinforcement learning
Zilong Chen, Wenlin Zhang
Author Affiliations +
Proceedings Volume 12715, Eighth International Conference on Electronic Technology and Information Science (ICETIS 2023); 127151K (2023) https://doi.org/10.1117/12.2682509
Event: Eighth International Conference on Electronic Technology and Information Science (ICETIS 2023), 2023, Dalian, China
Abstract
In recent years, end-to-end automatic speech recognition (ASR) based on deep neural networks become popular because of its simple pipeline and excellent performance. However, there exists a main mismatch between its training and testing that might lead to performance degradation: in the training stage, existing method use the maximum likelihood criterion which aims to maximize log-likelihood of the training data, while in the testing stage the performance is evaluated by word error rate (WER), not log-likelihood. In this paper, we propose an alternative method based on reinforcement learning to make the goals of training and testing more consistent. Viewing speech recognition as a sequence decision process, the encoder-decoder based neural network is used as the policy function. The encoder is a pre-trained speech representation model (Wav2vec2.0), which generates the environment state encoding. The decoder is trained using a policy gradient algorithm based on a mix reward function which reflects both the word error rate and language model score. Experimental results on the LibriSpeech corpus show that our proposed method achieves 4% relative improvements over the baseline with a language model in terms of WER.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zilong Chen and Wenlin Zhang "End-to-end speech recognition with reinforcement learning", Proc. SPIE 12715, Eighth International Conference on Electronic Technology and Information Science (ICETIS 2023), 127151K (20 June 2023); https://doi.org/10.1117/12.2682509
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Speech recognition

Machine learning

Detection and tracking algorithms

Performance modeling

Systems modeling

Data modeling

Back to Top