PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Many people believe that the understanding of classroom activities can benefit the parents and education experts to analyze the teaching situation. However, employing workers to supervise the events in the classroom costs lots of human resources. The deployment of surveillance video systems is considered to be a good solution to this problem. Converting videos captured by cameras into descriptions can further reduce data transmission and storage costs. In this paper, we propose a new task named Classroom Video Captioning (CVC), which aims at describing the events in classroom videos with natural language. We collect classroom videos and annotate them with sentences. To tackle the task, we employ an effective architecture called rethinking network to encode the visual features and generate the descriptions. The extensive experiments on our dataset demonstrate that our method can describe the events in classroom videos satisfactorily.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
The alert did not successfully save. Please try again later.
Mingjian Zhu, Chenrui Duan, Changbin Yu, "Rethinking network for classroom video captioning," Proc. SPIE 11719, Twelfth International Conference on Signal Processing Systems, 117190L (20 January 2021); https://doi.org/10.1117/12.2589435