Paper
13 January 2023 Speech emotion recognition based on SVM and CNN using MFCC feature extraction
Zheng Zhong
Author Affiliations +
Proceedings Volume 12510, International Conference on Statistics, Data Science, and Computational Intelligence (CSDSCI 2022); 125101S (2023) https://doi.org/10.1117/12.2657244
Event: International Conference on Statistics, Data Science, and Computational Intelligence (CSDSCI 2022), 2022, Qingdao, China
Abstract
This research aims to accomplish speech emotion recognition in Chinese by machine learning. For the training data, this paper used CASIA Chinese emotional speech database and to train the model. For the methods of classification, the researcher compares two kinds of methods including SVM (support vector machine) and CNN (convolutional neural network). In my Data preprocessing, this researcher used the MFCC (Mei frequency cepstral coefficient) From librosa to extract the feature of the audios in CASIA database. After training the feature with these two models, the researcher obtains two accuracies that are low. To improve the accuracy, the researcher tried to change the penalty coefficient and Gamma value for SVM. For CNN, this researcher tried to add drop out layer to the CNN structure and change the L2normalizer and number of epochs to increase the accuracy. In addition, due to the limitation of SVM’s classification ability, the accuracy and the performance is very difficult to improve. Therefore, this report will mainly focus on the speech emotion recognition using CNN. After the model has trained well, the researcher tested the model using the audio recorded by me and my classmate. After improving the accuracy of the model and recorded audios with more obvious identification feature, the researcher gets a very decent result.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zheng Zhong "Speech emotion recognition based on SVM and CNN using MFCC feature extraction", Proc. SPIE 12510, International Conference on Statistics, Data Science, and Computational Intelligence (CSDSCI 2022), 125101S (13 January 2023); https://doi.org/10.1117/12.2657244
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Emotion

Feature extraction

Databases

Overfitting

Machine learning

Speech recognition

Back to Top