A lightweight and fast skew detection and recognition method is proposed to address low detection accuracy, slow speed, and the inability to detect skewed spray codes on complex background packaging. Built upon the YOLOv5-obb network, the approach utilizes the Ghost module to lightweight the backbone network, reducing parameters and computations. The introduction of the Slim-neck lightweight feature fusion network structure in the neck further simplifies the model while enhancing detection accuracy. SimAM is added to both the backbone and neck to improve overall detection and recognition rates. In post-processing, a method for merging scene text characters is proposed to address skewed text merging. The final model reduces in size by 39.3%, achieving recognition rates, recall rates, and average precision means of 99.0%, 99.8%, and 99.2% for spray code character detection. The algorithm enables fast and accurate detection of skewed spray codes, providing support for rapid detection in relevant fields.
In response to the issues of low recognition accuracy in silent electromyographic facial action speech reconstruction tasks, this paper combines residual network (ResNet) and Transformer model to design a Res Transformer model based on ResNet structure and Transformer network. The model consists of three connected ResNet structures and several Transformer modules, and features are extracted using ResNet, The Transformer structure converts the electromyographic signal into a Mel frequency spectrum of 80 frequency bands, and sends the Mel frequency spectrum into the HiFiGAN network for speech reconstruction, ultimately obtaining the audible speech signal under silent action. In addition, our work also integrates acoustic speech signals, extracting the Mel frequency spectrum of synchronous acoustic speech signals as features, fully utilizing the time-frequency domain features of electromyography and speech signals. The experimental results show that the proposed Res Conform model has a phoneme recognition rate of 91.86% and a word error rate of 25.6%. Compared with using structures such as Transformer and LSTM, the Res-Conformer model has achieved effective improvement in recognition accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.