|
1.INTRODUCTIONWith the gradual development of society and the acceleration of the pace of urbanization, especially in the past 20 years, the railway transportation and road transportation industries have developed rapidly, and the number of vehicles in my country has risen sharply. Although it is convenient for people to travel, it also leads to the increasing rate of traffic accidents to a certain extent, resulting in huge economic losses and casualties. According to statistics, more than 90% of traffic accidents are caused by human drivers, and 30% of them are caused by drivers’ irregular behaviors1-3. Therefore, detecting the driver’s behavior, analyzing abnormal behavior and giving early warning will greatly increase the safety of vehicle operation. In the 1960s, scholars began to explore the mechanism of abnormal behavior from the perspective of behavioral environment4. For the research on abnormal driving behavior of drivers, the method of manual review was initially used to identify abnormal driving behavior of drivers, which consumes a lot of manpower and has low efficiency; some scholars judge whether the driver is driving normally by monitoring the changes of vehicle parameters5-8. However, it is expensive and difficult to operate on a large scale. In addition to the differences in driver habits, the modeling analysis is prone to deviation. Some researchers analyze the changes in the driver’s psychological state during driving to judge whether the driver is driving normally9-12. However, the professional equipment by drivers will affect their driving status and psychological changes, and the cost is high. In recent years, research on driver behavior based on computer vision has become a hot topic13. Berri and Silva use support vector machines to classify features14; Crave and Karray proposed AdaBoost classifier and hidden Markov model15. Due to the poor robustness of traditional detection techniques and large redundancy of detection windows, the accuracy and speed of target detection are limited. With the emergence of convolutional neural network, because of its good learning ability, the research on abnormal behavior target detection gradually develops to convolutional neural network. Target detection algorithms can be generally divided into two types: one-stage and two-stage. The two-stage algorithm is to generate target candidate regions first, and then classify them in a classifier, such as R-CNN, Faster R-CNN, etc. In one-stage, the image is first divided into image patches one by one, and then each image patch has M anchor boxes, and all anchors are sent to the classifier to output classification and detection positions, such as SSD, YOLOv3, etc. Obviously, the speed of YOLOv3 has been significantly improved in the latter one-stage compared to the former. Based on this, YOLOv3 is used as the initial network, and the attention mechanism is added to the feature layer of the model to improve the model’s ability to extract key information features, and at the same time, the loss function is improved to realize bounding box regression and detect abnormal driver behavior. 2.YOLOV3 AND ATTENTION MECHANISM2.1YOLOv3YOLOv1 was proposed in 2015, announcing the beginning of the YOLO series of algorithms. On this basis, in 2018, the author proposed the YOLOv3 algorithm, which breakthroughly used Darknet-53 as the backbone network, and introduced the reverse pyramid FPN architecture to achieve multi-dimensional prediction. Figure 1 shows the YOLOv3 network structure. 2.1.1Darknet-53 Network Structure.As the feature extraction network of YOLOv3, Darknet-53 adds many ResNet residual network structures to the network structure. Its network structure is shown in Figure 2. It is a fully convolutional neural network, including 53 convolutional layers, except for the pooling layer. 3×3 and 1×1 convolution units with Step 2 are used, and batch normalization and activation functions are used after convolution to prevent the network from overfitting. 2.1.2Multi-scale Feature Prediction.In order to improve the problem of low detection accuracy of small targets in the previous YOLO series network models, YOLOv3 draws on the idea of FPN, establishes multi-size feature prediction, and generates 3 feature maps of different sizes. The network features are shown in the figure. After 3 images, there are 3 kinds of downsampling, namely 8 times downsampling, 16 layers downsampling and 32 times downsampling, and 32 times downsampling is used, which uses feature maps of different sizes to be fused to enhance shallow features. For example, 13×13 is changed to 26×26, and fused with 26×26 feature map to strengthen features and improve detection accuracy and improve the effect of YOLOv3 model. 2.1.3Loss Function.The loss function of the YOLOv3 model consists of coordinate prediction loss, confidence prediction loss, and category prediction loss. Among them, both the confidence loss and the category prediction loss are changed from the sum of squares of errors of YOLOv1 and YOLOv2 to the cross entropy which has better effect on category and confidence prediction. The loss function formula is: where (x, y, w, h) represent the relative values of the center coordinates and width and height of the prediction frame relative to the network, I represents the anchor frame in the grid, C represents the real value of the object, and classes are different Classification of objects. 3.IMPROVED YOLOV3 ALGORITHM3.1Add attention mechanism moduleDuring the driving process of the driver in the real environment, affected by the external environment, driver status, driver behavior habits, etc., it will add interference to the detection of abnormal behavior. By obtaining more and more accurate feature information in the environment, the detection accuracy of the YOLOv3 algorithm is improved. In order to extract features from the network, the multi-scale prediction layer of YOLOv3 adopts a structure similar to Feature Pyramid Network (FPN)16, which fuses the extracted feature information on these three scales, and then performs object detection. This paper will introduce the dual-channel attention mechanism CBAM module, comprehensively use the attention mechanism from the two aspects of channel and space, add it to each branch of the multi-scale prediction layer of the feature extraction network, suppress useless information, and extract useful information more efficiently information to improve the detection ability of the algorithm. Channel attention is used to filter the F channel, while spatial attention emphasizes the obvious intervals of the feature map. The specific implementation process is shown in Figure 3. The input is the original feature F with dimension w×h×m, and the output is the attention meta-feature F. The calculation steps are as follows:
3.2Improved loss functionThe traditional Intersection over Union (IoU) is a commonly used indicator for evaluating the performance of target detection networks. The value of IoU is positively correlated with the degree of overlap between the predicted frame and the real frame. The calculation formula is as follows: where A represents the real frame; B represents the prediction box. Using IoU as the loss function, when there is no overlap between the real frame and the predicted frame, the loss function IoU is 0, and the distance between the real frame and the predicted frame cannot be measured, resulting in a gradient of 0 for the optimization loss function, which cannot be optimized. Aiming at this problem, it is proposed to use GIoU loss function instead of IoU to determine the error. Using GIoU as the loss function can overcome the shortcomings of IoU’s inability to measure the distance between the real box and the predicted box, and can better achieve bounding box regression, improve the model, and optimize the network in a better direction. Its calculation formula is as follows: where C represents the smallest bounding rectangle between the ground-truth box and the predicted box. The IoU is replaced by the GIoU loss function, and the new loss function of the improved YOLOv3 model is obtained as follows: 3.3Improved prior boxAnctor is a set of reference frames with different sizes, and its size is fixed. When doing target detection, the prior box is usually used as the initial prediction, and then gradually regressed and adjusted. Its calculation formula is as follows: Using anctor suitable for abnormal behavior detection not only speeds up the convergence of the model, but also further improves the accuracy of object detection. The 9 anctors used by the YOLOv3 model were obtained using the K-means clustering algorithm on the COCO dataset. The research content of this paper is the abnormal driving behavior of drivers. The target size of the abnormal behavior is different from the COCO data set. The nine priori boxes obtained by using the K-means clustering algorithm on the COCO data set are inappropriate and even affect Model localization and identification of abnormal behavior. Therefore, for the abnormal behavior data set, this paper uses the K-means clustering algorithm to re-cluster to obtain 9 anchors of different sizes suitable for abnormal behavior detection, as shown in Table 1 below: Table 1.Anchor prior box.
4.EXPERIMENT AND ANALYSIS OF DRIVER ABNORMAL BEHAVIOR DETECTIONThis experiment uses the Windows10 operating system, installs CUDA10.1 and cudnn7.6.5 to support GPU operation, and uses GPU to accelerate training. The algorithm simulation software is Python3.8.12, tensorflow2.3. 4.1DatasetThe research object of this experiment is the driver. According to the requirements of the traffic on the driver’s driving behavior, smoking, answering the phone, drinking water and eating during driving are all abnormal behaviors of the driver. Since there is no public data set for abnormal driving behavior of drivers in China, this experiment constructed the data set by itself. There are three main sources of experimental data, namely self-photographed data, pictures collected on the Internet, and images available in existing datasets, some of which are shown in Figure 4. The dataset is made in accordance with the VOC2007 deep learning dataset format. First images are labelled with labelimg and they are classified as driverphone—calling, driversmoke—smoking, drivercup—drinking water, and drivereat—eating; for training set and test set. Table 2 shows the number of data on various types of abnormal behavior. Table 2.The number of different types of abnormal behavior.
4.2Model abnormal behavior detection and result analysis4.2.1Evaluation Indicators.In this paper, the detection success rate and mAP are selected as evaluation indicators to judge the quality of the abnormal behavior detection model. The detection success rate is the probability that the images of this abnormal behavior that can be accurately identified account for all the data sets of this type. In the evaluation indicators of abnormal driving behavior of drivers, the commonly used parameters are as follows: Tp (Ture Positivies) indicates that an abnormality is detected, and the abnormality is also marked in the picture; TN (Ture Negatives) indicates that the abnormality is detected, but the picture is not marked with abnormality; Fp (False Positives)) indicates that no abnormality is detected and the picture is not marked with abnormality; FN (False Negatives) indicates that no abnormal behavior is detected, but there is abnormal behavior in the real picture. The precision rate (P, Precision) represents the proportion of images that are abnormal and are indeed abnormal to all images that are abnormal17,18, and the recall rate (R, Recall) represents that the abnormal and indeed abnormal images are detected. Annotate the proportion of anomalous images. Average Precision (AP) is the comprehensive recall (R) and precision (P), while mAP is the average precision (AP) mean, which is the standard and indicator for direct evaluation, and m represents the number of samples in the test set. Its calculation formula is as follows: 4.2.2Optimizing Model Fusion Experiments.In order to verify the optimization effect of the CBAM attention mechanism module and the improved loss function respectively, and to demonstrate the superiority of the final model, this paper designs the optimization model fusion experiment, and the experimental results are shown in Table 3. Model 1 means that only the CBAM attention mechanism is introduced. Model 2 means that only the loss function is improved, and the improved YOLOv3 model is the final model in this paper. The experimental results in the table show that when only the attention mechanism is introduced, the mAP is 92.58%, an increase of 2.36% compared with the YOLOv3 model; the loss function is improved, the mAP value is 91.25%, an increase of only 1.03%, and when both are improved at the same time, namely The model in this paper, the mAP value can reach 95.51%, an increase of 5.69% compared with the YOLOv3 model. Compared with Model 1 and Model 2, both have great increases, 3.33% and 4.69%, respectively. Whether compared with YOLOv3 or a single improved model, the model in this paper has been greatly improved, which proves the superior effect of the model in this paper. Table 3.Model fusion experiment.
4.2.3Object Detection Comparison Experiment.The improved YOLOv3 model in this paper and the Faster R-CNN, SSD, RetinaNet, and YOLOv3 models are trained and tested in the same dataset, and compared with each other. The test results of the detection success rate and mAP value are shown in Table 4. It can be seen from this that the detection success rate and mAP value of Faster R-CNN, SSD, and RetinaNet are generally below 85%, while the detection success rate and mAP value of the YOLOv3 model are much higher than other models. It reaches 91%, and the overall mAP value is 90.22%, which shows the good performance of YOLOv3 in the detection of abnormal driver behavior. Therefore, this paper selects YOLOv3 as the original model and improves it. It can be seen from the table that compared with YOLOv3, the improved YOLOv3 model in this paper has an increase of 6%, 7%, 11%, and 8% in the detection success rates of drinking water, answering calls, eating, and smoking, all of which are greatly improved, the success rate of drinking water and smoking detection even reached 97%, and mAP also increased by 5.69%, indicating that the model detection effect of this paper is remarkable and has superior performance. Table 4.Model detection success rate and mAP comparison experiment.
4.2.4Comparison of Detection Effects.In order to verify the effect of improving the YOLOv3 model, the two network architectures of the improved YOLOv3 and YOLOv3 will be tested in the test set. One test image is randomly selected for each type and the test results are compared. The comparison chart of the detection effect is shown in Figures 5 and 6. It can be seen from the detection diagram that the detection confidence of the improved YOLOv3 has been improved, which are 0.96, 0.96, 0.98, and 0.90, respectively, which is much higher than the confidence of the original YOLOv3 model. Overall, while the model continues to improve, the detection confidence of the model will also increase, which verifies the improved detection performance of YOLOv3. 5.CONCLUSIONThe driver abnormal behavior monitoring method based on the improved YOLOv3 algorithm proposed in this paper, through the collection of abnormal behavior data sets, the recognition rate is improved compared with the previous method, the Darket-53 framework is used to train the recognition model, the attention mechanism is increased, and the loss is improved. The function improves the robustness of the algorithm. By detecting four abnormal behaviors of drinking water, answering calls, smoking, and eating, the mAP value of the identification reaches 95.51%. The experimental results show that the algorithm is effective. REFERENCESDeng, M. Y., Fu, R. and Chen, G. J.,
“Collection and analysis about background factors of road traffic accidents index,”
Journal of Chongqing Jiaotong University, 31
(4), 852
–856
(2012). Google Scholar
Chen, G. H. and Zhen, H.,
“Statistical analysis of major road accidents in Guangdong Province and their countermeasures,”
Chinese Safety Science Journal, 20
(10), 106
–12
(2008). Google Scholar
He, Y., et al.,
“A comparison of statistical survey methods of traffic accident data between China and the Uuited States,”
Journal of Transport Information and Safety, 36
(1), 1
–9+27
(2008). Google Scholar
Hua, B., Liang, X., Liu, S. and Sheng, J. C.,
“Renovated abnormal passenger crowd behavior detection system based on the speeding-up and squeezing in the public places,”
Journal of Safety and Environment, 17
(3), 1043
–1048
(2017). Google Scholar
Hua B, Liang X and Liu S J,
“2017 Renovated abnormal passenger crowd behavior detection system based on the speeding-up and squeezing in the public places Journal of Safety and Environment,”
1043 Google Scholar
Ji, Q., Zhu, Z. and Lan, P. J.,
“Real-time nonintrusive monitoring and prediction of driver fatigue,”
IEEE Transactions on Vehicular Technology, 53
(4), 1052
–1068
(1995). https://doi.org/10.1109/TVT.2004.830974 Google Scholar
Chen, D. D.,
“[Research on Abnormal Driving Behavior Recognition Technology Based on Vehicle Dynamic Monitoring Data],”
Beijing Jiaotong University, Master’s
(2012). Google Scholar
Liang, Y. J. and Xiang, H. K.,
“Vehicle posture discrimination method based on FNN,”
Journal of Liaoning Technical University, 37
(2), 416
–421 Google Scholar
Tian, H. C., Mo, L. F. and Yan, R. Q.,
“Driving behavior identification in electric vehicle based on information fusion of vehicle information,”
Chinese Journal of Sensors and Actuators, 31
(3), 355
–362
(2018). Google Scholar
Awais, M., Badruddin, N. and Drieberg, M. C.,
“EEG brain connectivaty analysis to detect driver drowsiness using coherence,”
in Inter. Conf. on Frontiers of Information Technology,
(2017). Google Scholar
Sun, Y. and Yu, X. B. J.,
“An innovation nonintrusive driver assistance system for vital signal monitoring,”
IEEE Journal of Biomedical and Health Informatics, 18 1932
–1939
(2014). https://doi.org/10.1109/JBHI.6221020 Google Scholar
Lee, B. G. and Chung, W. Y.,
“Driver altertnass monitoring using fusion of facial features and biosignals,”
IEEE Sensors Journal, 12
(7), 2416
–2422
(2012). https://doi.org/10.1109/JSEN.2012.2190505 Google Scholar
Haouij, N. E., Poggi, J. M. and Ghozi, R. J.,
“Random forest-based approach for physiological functional variable selection for driver’s stress level classification,”
Statistical Methods & Applications, 28 157
–185
(2018). https://doi.org/10.1007/s10260-018-0423-5 Google Scholar
Das, N., Oho-Bar, E. and Trivedi, M. M. C.,
“On performance evaluation of driver hand detection algorithms: Challenges, dataset, and metrics,”
in 2015 IEEE 18th Inter. Conf. on Intelligent Transportation Systems,
(2015). Google Scholar
Berri, R. A., Silva, A. G. and Parpinelli, R. S. C.,
“A pattern recognition system for detecting use of mobile phones while driving,”
in 2014 Inter. Conf. on Computer Vision Theory and Applications,
(2014). Google Scholar
Ragab, A., Craye, C. and Kamel, M. S. C.,
“A visual-based driver distraction recognition and detection using random forest,”
in Inter. Conf. on Image Analysis and Recognition,
(2014). Google Scholar
Lin, T. Y., Dollar, P. and Gitshick, R. C.,
“Feature pyramid networks for object detection,”
in IEEE Conf. on Computer Vision and Pattern Recognition,
(2017). Google Scholar
Li, Q. D., Wang, T. C., Cui, J. W. and Mu, B.,
“Detection on dangerous operation behavior of forklift based on deep learning algorithm,”
Journal of Safety Science and Technology, 16
(5), 155
–159
(2019). Google Scholar
Zhang, J. Y., Zhou, Y. L. and Chen, J. W.,
“Research on pilot behavior detection algorithm based on deep learning,”
China’s New Technologies and New Products,
(4), 26
–28
(2019). Google Scholar
|