Automatic Speech Recognition (ASR) is an important technology in modern society, since it acts as a great tool for humans to communicate with computers, visually impaired people, and deaf people. However, existing speech recognition methods are still facing many problems. Some of the methods require a large model and excessive parameters, others cannot achieve reliable accuracy. Therefore, our study utilizes an Efficient Transformer Model with Convolutional Network to conduct an ASR task. Our model significantly improves the accuracy of speech recognition while does not become a huge model with a large number of parameters.
|