Open Access Paper
28 December 2022 Driver abnormal behavior detection based on dual-channel attention mechanism
Zhiwen Feng, Jun Zhao
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125066H (2022) https://doi.org/10.1117/12.2662537
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
In order to effectively detect the abnormal driving behavior of drivers and reduce the incidence of traffic accidents, a YOLOv3 abnormal behavior monitoring algorithm based on dual channel attention mechanism is proposed. The algorithm is based on YOLOv3 network model. Firstly, the K-mean clustering algorithm is used to recluster the anctor, and nine prior frames suitable for abnormal behavior are obtained. Then, a dual channel attention mechanism module is introduced into the YOLOv3 feature extraction network to increase the weight of important information in each channel of the feature map. Finally, the loss function of the model is improved. Giou loss is used as the loss function of the boundary box of the model to better realize the regression of the boundary box and improve the performance and accuracy of the model. The experimental results show that the map value of the improved YOLOv3 algorithm in driver abnormal behavior detection reaches 95.91%, which is much higher than the traditional YOLOv3 model and other target detection models, and the detection effect is remarkable.

1.

INTRODUCTION

With the gradual development of society and the acceleration of the pace of urbanization, especially in the past 20 years, the railway transportation and road transportation industries have developed rapidly, and the number of vehicles in my country has risen sharply. Although it is convenient for people to travel, it also leads to the increasing rate of traffic accidents to a certain extent, resulting in huge economic losses and casualties. According to statistics, more than 90% of traffic accidents are caused by human drivers, and 30% of them are caused by drivers’ irregular behaviors1-3. Therefore, detecting the driver’s behavior, analyzing abnormal behavior and giving early warning will greatly increase the safety of vehicle operation.

In the 1960s, scholars began to explore the mechanism of abnormal behavior from the perspective of behavioral environment4. For the research on abnormal driving behavior of drivers, the method of manual review was initially used to identify abnormal driving behavior of drivers, which consumes a lot of manpower and has low efficiency; some scholars judge whether the driver is driving normally by monitoring the changes of vehicle parameters5-8. However, it is expensive and difficult to operate on a large scale. In addition to the differences in driver habits, the modeling analysis is prone to deviation. Some researchers analyze the changes in the driver’s psychological state during driving to judge whether the driver is driving normally9-12. However, the professional equipment by drivers will affect their driving status and psychological changes, and the cost is high. In recent years, research on driver behavior based on computer vision has become a hot topic13. Berri and Silva use support vector machines to classify features14; Crave and Karray proposed AdaBoost classifier and hidden Markov model15. Due to the poor robustness of traditional detection techniques and large redundancy of detection windows, the accuracy and speed of target detection are limited. With the emergence of convolutional neural network, because of its good learning ability, the research on abnormal behavior target detection gradually develops to convolutional neural network.

Target detection algorithms can be generally divided into two types: one-stage and two-stage. The two-stage algorithm is to generate target candidate regions first, and then classify them in a classifier, such as R-CNN, Faster R-CNN, etc. In one-stage, the image is first divided into image patches one by one, and then each image patch has M anchor boxes, and all anchors are sent to the classifier to output classification and detection positions, such as SSD, YOLOv3, etc. Obviously, the speed of YOLOv3 has been significantly improved in the latter one-stage compared to the former. Based on this, YOLOv3 is used as the initial network, and the attention mechanism is added to the feature layer of the model to improve the model’s ability to extract key information features, and at the same time, the loss function is improved to realize bounding box regression and detect abnormal driver behavior.

2.

YOLOV3 AND ATTENTION MECHANISM

2.1

YOLOv3

YOLOv1 was proposed in 2015, announcing the beginning of the YOLO series of algorithms. On this basis, in 2018, the author proposed the YOLOv3 algorithm, which breakthroughly used Darknet-53 as the backbone network, and introduced the reverse pyramid FPN architecture to achieve multi-dimensional prediction. Figure 1 shows the YOLOv3 network structure.

Figure 1.

YOLOv3 network structure.

00219_PSISDG12506_125066H_page_2_1.jpg

2.1.1

Darknet-53 Network Structure.

As the feature extraction network of YOLOv3, Darknet-53 adds many ResNet residual network structures to the network structure. Its network structure is shown in Figure 2. It is a fully convolutional neural network, including 53 convolutional layers, except for the pooling layer. 3×3 and 1×1 convolution units with Step 2 are used, and batch normalization and activation functions are used after convolution to prevent the network from overfitting.

Figure 2.

Darknet-53 network structure.

00219_PSISDG12506_125066H_page_2_2.jpg

2.1.2

Multi-scale Feature Prediction.

In order to improve the problem of low detection accuracy of small targets in the previous YOLO series network models, YOLOv3 draws on the idea of FPN, establishes multi-size feature prediction, and generates 3 feature maps of different sizes. The network features are shown in the figure. After 3 images, there are 3 kinds of downsampling, namely 8 times downsampling, 16 layers downsampling and 32 times downsampling, and 32 times downsampling is used, which uses feature maps of different sizes to be fused to enhance shallow features. For example, 13×13 is changed to 26×26, and fused with 26×26 feature map to strengthen features and improve detection accuracy and improve the effect of YOLOv3 model.

2.1.3

Loss Function.

The loss function of the YOLOv3 model consists of coordinate prediction loss, confidence prediction loss, and category prediction loss. Among them, both the confidence loss and the category prediction loss are changed from the sum of squares of errors of YOLOv1 and YOLOv2 to the cross entropy which has better effect on category and confidence prediction. The loss function formula is:

00219_PSISDG12506_125066H_page_3_1.jpg

where (x, y, w, h) represent the relative values of the center coordinates and width and height of the prediction frame relative to the network, I represents the anchor frame in the grid, C represents the real value of the object, and classes are different Classification of objects.

3.

IMPROVED YOLOV3 ALGORITHM

3.1

Add attention mechanism module

During the driving process of the driver in the real environment, affected by the external environment, driver status, driver behavior habits, etc., it will add interference to the detection of abnormal behavior. By obtaining more and more accurate feature information in the environment, the detection accuracy of the YOLOv3 algorithm is improved.

In order to extract features from the network, the multi-scale prediction layer of YOLOv3 adopts a structure similar to Feature Pyramid Network (FPN)16, which fuses the extracted feature information on these three scales, and then performs object detection. This paper will introduce the dual-channel attention mechanism CBAM module, comprehensively use the attention mechanism from the two aspects of channel and space, add it to each branch of the multi-scale prediction layer of the feature extraction network, suppress useless information, and extract useful information more efficiently information to improve the detection ability of the algorithm. Channel attention is used to filter the F channel, while spatial attention emphasizes the obvious intervals of the feature map. The specific implementation process is shown in Figure 3. The input is the original feature F with dimension w×h×m, and the output is the attention meta-feature F. The calculation steps are as follows:

  • (1) The channel information is adjusted. The input meta-feature F of dimension w×h×m enters the channel attention network, and performs global maximum pooling and global average pooling operations respectively to generate two tensors with dimensions of 1×1×m. The tensors are spliced with each other to form a 1×1×m fusion tensor. After activation by the Sigmoid function, it is multiplied with the original input feature F in the form of an element-wise matrix (element-wise) and corrected to obtain the intermediate element feature.

  • (2) Spatial and channel attention information is fused. After entering the spatial attention network, the intermediate feature F’ is subjected to channel maximum pooling and channel average pooling, respectively, to form two matrices with dimensions w × h. These two matrices are spliced and fused to obtain a dimension of w. After the fusion tensor of h is activated by the Sigmoid function, it is finally multiplied with the input feature F in the form of an element matrix, and finally the meta-feature F’of the attention mechanism of the target detection area is formed.

Figure 3.

Attention mechanism module.

00219_PSISDG12506_125066H_page_4_1.jpg

3.2

Improved loss function

The traditional Intersection over Union (IoU) is a commonly used indicator for evaluating the performance of target detection networks. The value of IoU is positively correlated with the degree of overlap between the predicted frame and the real frame. The calculation formula is as follows:

00219_PSISDG12506_125066H_page_4_2.jpg

where A represents the real frame; B represents the prediction box.

Using IoU as the loss function, when there is no overlap between the real frame and the predicted frame, the loss function IoU is 0, and the distance between the real frame and the predicted frame cannot be measured, resulting in a gradient of 0 for the optimization loss function, which cannot be optimized. Aiming at this problem, it is proposed to use GIoU loss function instead of IoU to determine the error.

Using GIoU as the loss function can overcome the shortcomings of IoU’s inability to measure the distance between the real box and the predicted box, and can better achieve bounding box regression, improve the model, and optimize the network in a better direction. Its calculation formula is as follows:

00219_PSISDG12506_125066H_page_4_3.jpg

where C represents the smallest bounding rectangle between the ground-truth box and the predicted box.

The IoU is replaced by the GIoU loss function, and the new loss function of the improved YOLOv3 model is obtained as follows:

00219_PSISDG12506_125066H_page_4_4.jpg

3.3

Improved prior box

Anctor is a set of reference frames with different sizes, and its size is fixed. When doing target detection, the prior box is usually used as the initial prediction, and then gradually regressed and adjusted. Its calculation formula is as follows:

00219_PSISDG12506_125066H_page_5_1.jpg

Using anctor suitable for abnormal behavior detection not only speeds up the convergence of the model, but also further improves the accuracy of object detection. The 9 anctors used by the YOLOv3 model were obtained using the K-means clustering algorithm on the COCO dataset.

The research content of this paper is the abnormal driving behavior of drivers. The target size of the abnormal behavior is different from the COCO data set. The nine priori boxes obtained by using the K-means clustering algorithm on the COCO data set are inappropriate and even affect Model localization and identification of abnormal behavior. Therefore, for the abnormal behavior data set, this paper uses the K-means clustering algorithm to re-cluster to obtain 9 anchors of different sizes suitable for abnormal behavior detection, as shown in Table 1 below:

Table 1.

Anchor prior box.

Feature mapReceptive fieldNumberAnchor
52×52Small0,1,2104×149, 143×115, 154×198
26×26Middle3,4,560×83, 83×113, 97×79
13×13Big6,7,822×31, 29×37, 52×54

4.

EXPERIMENT AND ANALYSIS OF DRIVER ABNORMAL BEHAVIOR DETECTION

This experiment uses the Windows10 operating system, installs CUDA10.1 and cudnn7.6.5 to support GPU operation, and uses GPU to accelerate training. The algorithm simulation software is Python3.8.12, tensorflow2.3.

4.1

Dataset

The research object of this experiment is the driver. According to the requirements of the traffic on the driver’s driving behavior, smoking, answering the phone, drinking water and eating during driving are all abnormal behaviors of the driver. Since there is no public data set for abnormal driving behavior of drivers in China, this experiment constructed the data set by itself. There are three main sources of experimental data, namely self-photographed data, pictures collected on the Internet, and images available in existing datasets, some of which are shown in Figure 4.

Figure 4.

Abnormal behavior dataset.

00219_PSISDG12506_125066H_page_5_2.jpg

The dataset is made in accordance with the VOC2007 deep learning dataset format. First images are labelled with labelimg and they are classified as driverphone—calling, driversmoke—smoking, drivercup—drinking water, and drivereat—eating; for training set and test set. Table 2 shows the number of data on various types of abnormal behavior.

Table 2.

The number of different types of abnormal behavior.

ClassificationMake and receive callsSmokeDrink waterEat
Quantity1612106211561085

4.2

Model abnormal behavior detection and result analysis

4.2.1

Evaluation Indicators.

In this paper, the detection success rate and mAP are selected as evaluation indicators to judge the quality of the abnormal behavior detection model. The detection success rate is the probability that the images of this abnormal behavior that can be accurately identified account for all the data sets of this type. In the evaluation indicators of abnormal driving behavior of drivers, the commonly used parameters are as follows: Tp (Ture Positivies) indicates that an abnormality is detected, and the abnormality is also marked in the picture; TN (Ture Negatives) indicates that the abnormality is detected, but the picture is not marked with abnormality; Fp (False Positives)) indicates that no abnormality is detected and the picture is not marked with abnormality; FN (False Negatives) indicates that no abnormal behavior is detected, but there is abnormal behavior in the real picture. The precision rate (P, Precision) represents the proportion of images that are abnormal and are indeed abnormal to all images that are abnormal17,18, and the recall rate (R, Recall) represents that the abnormal and indeed abnormal images are detected. Annotate the proportion of anomalous images. Average Precision (AP) is the comprehensive recall (R) and precision (P), while mAP is the average precision (AP) mean, which is the standard and indicator for direct evaluation, and m represents the number of samples in the test set. Its calculation formula is as follows:

00219_PSISDG12506_125066H_page_6_1.jpg
00219_PSISDG12506_125066H_page_6_2.jpg
00219_PSISDG12506_125066H_page_6_3.jpg
00219_PSISDG12506_125066H_page_6_4.jpg

4.2.2

Optimizing Model Fusion Experiments.

In order to verify the optimization effect of the CBAM attention mechanism module and the improved loss function respectively, and to demonstrate the superiority of the final model, this paper designs the optimization model fusion experiment, and the experimental results are shown in Table 3. Model 1 means that only the CBAM attention mechanism is introduced. Model 2 means that only the loss function is improved, and the improved YOLOv3 model is the final model in this paper. The experimental results in the table show that when only the attention mechanism is introduced, the mAP is 92.58%, an increase of 2.36% compared with the YOLOv3 model; the loss function is improved, the mAP value is 91.25%, an increase of only 1.03%, and when both are improved at the same time, namely The model in this paper, the mAP value can reach 95.51%, an increase of 5.69% compared with the YOLOv3 model. Compared with Model 1 and Model 2, both have great increases, 3.33% and 4.69%, respectively. Whether compared with YOLOv3 or a single improved model, the model in this paper has been greatly improved, which proves the superior effect of the model in this paper.

Table 3.

Model fusion experiment.

ModelAttention mechanismImproved loss functionmAP/%
YOLOv3NoNo90.22%
Model 1YesNo92.58%
Model 2NoYes91.25%
Improved YOLOv3 (Our model)YesYes95.91%

4.2.3

Object Detection Comparison Experiment.

The improved YOLOv3 model in this paper and the Faster R-CNN, SSD, RetinaNet, and YOLOv3 models are trained and tested in the same dataset, and compared with each other. The test results of the detection success rate and mAP value are shown in Table 4. It can be seen from this that the detection success rate and mAP value of Faster R-CNN, SSD, and RetinaNet are generally below 85%, while the detection success rate and mAP value of the YOLOv3 model are much higher than other models. It reaches 91%, and the overall mAP value is 90.22%, which shows the good performance of YOLOv3 in the detection of abnormal driver behavior. Therefore, this paper selects YOLOv3 as the original model and improves it. It can be seen from the table that compared with YOLOv3, the improved YOLOv3 model in this paper has an increase of 6%, 7%, 11%, and 8% in the detection success rates of drinking water, answering calls, eating, and smoking, all of which are greatly improved, the success rate of drinking water and smoking detection even reached 97%, and mAP also increased by 5.69%, indicating that the model detection effect of this paper is remarkable and has superior performance.

Table 4.

Model detection success rate and mAP comparison experiment.

ModelDrinkMake and receive callsEatSmokemAP/%
Faster R-CNN85%70%82%87%81.69
SSD80%85%79%85%80.54
RetinaNet87%82%87%83%84.55
YOLOv391%88%80%89%90.22
Improved YOLOv3 (Our model)97%95%91%97%95.91

4.2.4

Comparison of Detection Effects.

In order to verify the effect of improving the YOLOv3 model, the two network architectures of the improved YOLOv3 and YOLOv3 will be tested in the test set. One test image is randomly selected for each type and the test results are compared. The comparison chart of the detection effect is shown in Figures 5 and 6. It can be seen from the detection diagram that the detection confidence of the improved YOLOv3 has been improved, which are 0.96, 0.96, 0.98, and 0.90, respectively, which is much higher than the confidence of the original YOLOv3 model. Overall, while the model continues to improve, the detection confidence of the model will also increase, which verifies the improved detection performance of YOLOv3.

Figure 5.

Abnormal behavior renderings detection with YOLOv3.

00219_PSISDG12506_125066H_page_7_1.jpg

Figure 6.

Abnormal behavior renderings detection with improved YOLOv3.

00219_PSISDG12506_125066H_page_8_1.jpg

5.

CONCLUSION

The driver abnormal behavior monitoring method based on the improved YOLOv3 algorithm proposed in this paper, through the collection of abnormal behavior data sets, the recognition rate is improved compared with the previous method, the Darket-53 framework is used to train the recognition model, the attention mechanism is increased, and the loss is improved. The function improves the robustness of the algorithm. By detecting four abnormal behaviors of drinking water, answering calls, smoking, and eating, the mAP value of the identification reaches 95.51%. The experimental results show that the algorithm is effective.

REFERENCES

[1] 

Deng, M. Y., Fu, R. and Chen, G. J., “Collection and analysis about background factors of road traffic accidents index,” Journal of Chongqing Jiaotong University, 31 (4), 852 –856 (2012). Google Scholar

[2] 

Chen, G. H. and Zhen, H., “Statistical analysis of major road accidents in Guangdong Province and their countermeasures,” Chinese Safety Science Journal, 20 (10), 106 –12 (2008). Google Scholar

[3] 

He, Y., et al., “A comparison of statistical survey methods of traffic accident data between China and the Uuited States,” Journal of Transport Information and Safety, 36 (1), 1 –9+27 (2008). Google Scholar

[4] 

Hua, B., Liang, X., Liu, S. and Sheng, J. C., “Renovated abnormal passenger crowd behavior detection system based on the speeding-up and squeezing in the public places,” Journal of Safety and Environment, 17 (3), 1043 –1048 (2017). Google Scholar

[5] 

Hua B, Liang X and Liu S J, “2017 Renovated abnormal passenger crowd behavior detection system based on the speeding-up and squeezing in the public places Journal of Safety and Environment,” 1043 Google Scholar

[6] 

Ji, Q., Zhu, Z. and Lan, P. J., “Real-time nonintrusive monitoring and prediction of driver fatigue,” IEEE Transactions on Vehicular Technology, 53 (4), 1052 –1068 (1995). https://doi.org/10.1109/TVT.2004.830974 Google Scholar

[7] 

Chen, D. D., “[Research on Abnormal Driving Behavior Recognition Technology Based on Vehicle Dynamic Monitoring Data],” Beijing Jiaotong University, Master’s (2012). Google Scholar

[8] 

Liang, Y. J. and Xiang, H. K., “Vehicle posture discrimination method based on FNN,” Journal of Liaoning Technical University, 37 (2), 416 –421 Google Scholar

[9] 

Tian, H. C., Mo, L. F. and Yan, R. Q., “Driving behavior identification in electric vehicle based on information fusion of vehicle information,” Chinese Journal of Sensors and Actuators, 31 (3), 355 –362 (2018). Google Scholar

[10] 

Awais, M., Badruddin, N. and Drieberg, M. C., “EEG brain connectivaty analysis to detect driver drowsiness using coherence,” in Inter. Conf. on Frontiers of Information Technology, (2017). Google Scholar

[11] 

Sun, Y. and Yu, X. B. J., “An innovation nonintrusive driver assistance system for vital signal monitoring,” IEEE Journal of Biomedical and Health Informatics, 18 1932 –1939 (2014). https://doi.org/10.1109/JBHI.6221020 Google Scholar

[12] 

Lee, B. G. and Chung, W. Y., “Driver altertnass monitoring using fusion of facial features and biosignals,” IEEE Sensors Journal, 12 (7), 2416 –2422 (2012). https://doi.org/10.1109/JSEN.2012.2190505 Google Scholar

[13] 

Haouij, N. E., Poggi, J. M. and Ghozi, R. J., “Random forest-based approach for physiological functional variable selection for driver’s stress level classification,” Statistical Methods & Applications, 28 157 –185 (2018). https://doi.org/10.1007/s10260-018-0423-5 Google Scholar

[14] 

Das, N., Oho-Bar, E. and Trivedi, M. M. C., “On performance evaluation of driver hand detection algorithms: Challenges, dataset, and metrics,” in 2015 IEEE 18th Inter. Conf. on Intelligent Transportation Systems, (2015). Google Scholar

[15] 

Berri, R. A., Silva, A. G. and Parpinelli, R. S. C., “A pattern recognition system for detecting use of mobile phones while driving,” in 2014 Inter. Conf. on Computer Vision Theory and Applications, (2014). Google Scholar

[16] 

Ragab, A., Craye, C. and Kamel, M. S. C., “A visual-based driver distraction recognition and detection using random forest,” in Inter. Conf. on Image Analysis and Recognition, (2014). Google Scholar

[17] 

Lin, T. Y., Dollar, P. and Gitshick, R. C., “Feature pyramid networks for object detection,” in IEEE Conf. on Computer Vision and Pattern Recognition, (2017). Google Scholar

[18] 

Li, Q. D., Wang, T. C., Cui, J. W. and Mu, B., “Detection on dangerous operation behavior of forklift based on deep learning algorithm,” Journal of Safety Science and Technology, 16 (5), 155 –159 (2019). Google Scholar

[19] 

Zhang, J. Y., Zhou, Y. L. and Chen, J. W., “Research on pilot behavior detection algorithm based on deep learning,” China’s New Technologies and New Products, (4), 26 –28 (2019). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhiwen Feng and Jun Zhao "Driver abnormal behavior detection based on dual-channel attention mechanism", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125066H (28 December 2022); https://doi.org/10.1117/12.2662537
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Data modeling

Target detection

Performance modeling

Analytical research

Feature extraction

Water

Back to Top