Speech recognition has shown excellent performance in recent years, but as we known, this kind of AI task is still supervised and needs labelled data to feed models so that it can achieve a better effect and to inference to get model’s metrics. In actual application scenarios, it always needs many people to check speech recognition’s correctness because of lacking labelled data, it is not realistic to label massive online audio. In addition, this method can’t guarantee comprehensiveness of checking, so confidence estimation algorithm is proposed, which can evaluate speech recognition model’s results and predict error transcriptions automatically. This paper proposed a confidence estimation model based on a multi-feature fusion mechanism and it mainly focus on Chinese end-to-end speech recognition tasks in complex application scenarios. Experiments in Aishell-1 dataset and China telecom’s internal dataset have proved that this model can obtain a good performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.