Lightweight network based on the high-resolution structure for acoustic scene classification

Haiyue Zhang; Wenkai Liu; Xichang Cai; Menglong Wu

doi:10.1117/12.3038120

7 August 2024 Lightweight network based on the high-resolution structure for acoustic scene classification

Haiyue Zhang, Wenkai Liu, Xichang Cai, Menglong Wu

Author Affiliations +

Proceedings Volume 13229, Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024); 132292T (2024) https://doi.org/10.1117/12.3038120
Event: Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024), 2024, Nanchang, China

Abstract

Acoustic scene classification is the task of assigning scene labels to audio based on the recorded environment. Although deep learning models perform well in the field, they usually rely on hardware platforms with high computing power, which limits the popularity of deep learning methods in practical applications. To address this issue, we propose a lightweight High-Resolution network (Lite-HRNet) with only 76.224K training parameters. The structure is based on Lite-Net, a lightweight model constructed using inverted residual modules. Based on the Lite-Net, we introduce the High-Resolution (HR) structure to maintain high-resolution in the frequency axis direction, which effectively fuses high-resolution and low-resolution features in parallel, maintaining low complexity. In addition, the coordinate attention mechanism (CA) is introduced to direct the network's focus towards critical information. Experimental results show that Lite-HRNet improves the classification accuracy of various scenes in the TAU Urban Acoustic Scenes 2022 Mobile Development dataset, achieving an average accuracy of 52.4%, which is a 9.5% improvement compared to the DCASE baseline system.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Haiyue Zhang, Wenkai Liu, Xichang Cai, and Menglong Wu "Lightweight network based on the high-resolution structure for acoustic scene classification", Proc. SPIE 13229, Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024), 132292T (7 August 2024); https://doi.org/10.1117/12.3038120

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE