Voice is the main way of communication and information sharing with others, It brings great convenience to human life. The existing speech recognition classification has the problem of considerable performance attenuation to environment noise and accent. Most of these problems can be mitigated by training on large amounts of data. However, collecting large Numbers of high-quality datasets in real life is time-consuming and expensive. In order to solve this problem, this paper proposes a data enhancement method,which is suitable for the speech image extension of small samples. S-GAN is used to generate datasets that conform to the real distribution of samples, and GAN-train and GAN-test methods are used to evaluate the quality and diversity of network generated images. Meanwhile, the spatial transformation network (STN) and CNN framework are combined to get the useful information part of the data for data classification. The results show that this method can significantly improve the classification accuracy of speech recognition and lay a foundation for small sample data enhancement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.