A recognition method based on the enhanced Transformer model is proposed to solve the task of human abnormal action recognition in surveillance videos. Video Swin Transformer (VST) is used to extract video features, and the 3D Adaptive Spatial Pyramid Pooling (3DASPP) module is used to enhance video features. Human body detection is performed on the key frame in the video through the target detection algorithm, and the video features corresponding to the target are extracted. Finally, the human body action category in the video sequence is identified, and whether there is an abnormality is judged. The mean Average Precision (mAP) is used as the evaluation metric. Experimental results show that the proposed algorithm can effectively recognize abnormal human actions in videos, providing strong technical support for intelligent surveillance, intelligent security, and other related fields.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.