Speech recognition has always been one of the research focuses in the field of human-computer communication and interaction. The main purpose of automatic speech recognition (ASR) is to convert speech waveform signals into text. Acoustic model is the main component of ASR, which is used to connect the observation features of speech signals with the speech modeling units. In recent years, deep learning has become the mainstream technology in the field of speech recognition. In this paper, a convolutional neural network architecture composed of VGG and Connectionist Temporal Classification (CTC) loss function was proposed for speech recognition acoustic model. Traditional acoustic model training is based on frame-level labels with cross-entropy criterion, which requires a tedious label alignment procedure. The CTC loss was adopted to automatically learn the alignments between speech frames and label sequences, such that the training process is end-to-end. The architecture can exploit temporal and spectral structures of speech signals simultaneously. Batch normalization (BN) technique was used for normalizing each layers input to reduce internal covariance shift. To prevent overfitting, dropout technique was used during training to improve network generalization ability. The speech signal was transformed into a spectral image through a series of processing to be the input of the neural network. The input feature is 200 dimensions, and output labels of acoustic mode is 415 Chinese pronunciation without pitch. The experimental results demonstrated that the proposed model achieves the Character error rate (CER) of 17.97% and 23.86% on public Mandarin speech corpus, AISHELL-1 and ST-CMDS-20170001_1, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.