A smart bus lane monitoring system based on object detection

Ziyao Meng; Zhigang Wen

doi:10.1117/12.2662596

28 December 2022 A smart bus lane monitoring system based on object detection

Ziyao Meng, Zhigang Wen

Author Affiliations +

Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125065N (2022) https://doi.org/10.1117/12.2662596
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China

Abstract

In this paper, we introduced a lightweight object detecting system which can help to detect illegal occupation in bus lane. This system can play a role to make the city smarter. The system can classify vehicles into social cars and buses, recognize and detect car licenses. At the beginning, we introduced relevant works done by other researchers which turned out to be inspiring. Next, we designed the structure of our system into three different parts: central service module, server, and embedded device. Each part plays different roles. The central service module realizes the function of the front-end page and back-end of the platform by applying micro-service architecture. Server is responsible for works related to model training and communication between embedded device and cloud server. Embedded device should be implemented on buses, detecting, and recording illegal occupation of bus lane by running the trained object detection model. The system can realize different functions by implementing different models. We used deep learning to realize our expected function. First, a dataset includes two kinds of vehicle, car plate, 10 Arabic numbers, 26 English letters, and 6 Chinese abbreviation of 6 Chinese provincial districts. Then we use YoloV5 to train the model. After the model is trained, we evaluated the model. The results indicate that the model matches our expectation.

1. INTRODUCTION

With the rapid development of urbanization in China, more and more cities starting to establish bus lane to encourage people to take public transport and release traffic jam. Regulation related to bus lane in China various in different cities. However, according to relevant regulations in different cities, we can notice that all those kinds of regulations is similar1- 3. To start with, parking is not allowed near the bus station. Also, during rush hours, social vehicles are always not allowed to use bus lane if it is not necessary. Moreover, special vehicle such as ambulance and firefighting track can use bus lane in any condition. And for some emergency and abnormal conditions, all vehicles can use bus lane under traffic police’s conduction.

In this situation, systems that can help to detect and record the illegal vehicle on the bus lane are significant. And this is a part of smart city. Since bus lane is usually long and complex, only use fixed camera along the road can have relatively large blind zone. Researchers have developed many creative solutions to solve this problem. In 1996, Eastman applied camera to deter the illegal use of bus lane in Birmingham4. However, due to the limitation of technology, this system is complex, and its performance is not satisfying. Nowadays, with the development of computer science and micro-processor, we have better algorithms and more powerful processors. These developments enable some real-time and high accuracy detecting solutions. Li used FPGA to detect social vehicles on bus lane and remind staff to deal with it5. Ketcham used SVM, linear regression model to recognize illegal parking in front of the bus station6. Their method reaches a relatively high accuracy and has potential to be commercialized.

Yonetsu had done an impressing work in license-plate detection. They proposed a two-stage detection method based on YoloV27. In the first stage, vehicles are detected and classified. In the second stage, the license-plate on each vehicle will be located. To realize this, they created a database of cars and car license-plates in japan. Their model works well in nighttime, blurry image, and other imperfect conditions. Liu proposed an edge-end based car licenses recognition methods8.

SSD-MobileNet are used to locate the car plate first. Then, an end-to-end method is used to detect the character on the license-plate. Their method is lightweight and relative fast but still cannot achieve real-time detection.

Mo proposed their method to classify vehicle for multi-lanes roads9. They detect the colour of license-plate to separate normal cars and other kinds of vehicles. Then, it extracts the features in other kinds of vehicles to classify those vehicles into bus and truck. In this project, what we need to do is to divide all vehicles into two categories: social vehicles which are normally not allowed to use bus lane and public transport like bus which can legally use bus lane.

Inspired by above research, we decided to realize real-time vehicle classification and car plate recognition and detection in one model. This research has potential to be implement on buses to record vehicles which illegally occupy the bus lane. To achieve this function, we should realize vehicle classification, license-plate detection, and recognition. In fact, scholars have done some pertinent researches that turn out to be valuable and inspiring.

2. PROBLEMS

Currently, more and more cites in China have their own bus lane. Usually, in specific time, bus lane should only be used by bus to increase the efficiency of public traffic during peak hour. However, some social vehicle may occupy the bus lane during peak hour to save their time. This can significantly reduce the efficiency of public transportation and make the bus lane useless. Although we have regulations and laws to punish this action, it can be hard to detect and regularize this action by fixed camera along the road, as the bus lane in a city is usually long and complex. Usually, fixed camera cannot cover the bus lane completely, some driver may choose to occupy the bus lane which are not covered by the monitoring system. The laws and regulations related to bus lane will be valid only if most of the illegal occupation can be detected and punished. Use fixed camera to build a fixed monitoring system can be expensive. The problem is how to detect and record the illegal occupation of the bus lane with less blind side and reasonable cost. Our research is trying to develop a system to solve this problem.

3. OVERALL APPLICATION SENARIO SYSTEM ARCHITECTURE

3.1

Overall platform design

We decided to implement the system on buses. This can help to eliminate the blind zone of the fixed monitoring system along the road. The advantage to implement the system on buses is obvious. It can record most illegal occupation of bus lane that affect buses.

The whole structure can be divided into 3 parts: central server module, server (PC server and cloud server), and embedded device.

The central server module mainly realizes the function of front-end page and back-end service. Here, we apply microservice architecture solution. This method helps to ensure the independence between each service and avoid the failure of the whole system caused by the malfunction of a single node. Data storage and cache is mainly realized by data base and Redis. The major logic of the central server is built by Spring Cloud architecture.

The server includes PC server and cloud server. PC server mainly responsible for the implementation of the hardware and software environment required by the platform. Apart from that, we use PC server to quantify and compress the model file, adjust, and interpret the burning model, and burn the model. Moreover, PC server use serial communication to realize transportation and control of the

real-time results. The cloud server is responsible for the training of lightweight object detection algorithm, evaluation of model and transform the format of the model.

The embedded device we apply is based on the Kendryte PAddlePi-K210 platform which is based on K210 chip. In this paper, we call all the vehicle-mounted devices under the scene of intelligent supervision of public transportation as embedded devices if not noted. That is because that we want to design and realize a lightweight embedded object detecting platform which will not limited by the verified development platform. Other deep learning-based computer vision scenario required by intelligent supervision of public transportation can also apply our platform, which means other embedded scenarios can also design their own algorithm and execute relevant function through the unified service progress of our platform. This platform is not customized for our project, it is a universal platform for all deep learning computer vision project. By constructing this platform, we can easily implement different trained model to realize different functions with reasonable cost, since all the hardware is universal, what we need to do is to change the software in embedded device. This platform can be helpful to make the city more intelligent.

3.2

Central service module

The main function of the central service module is to provide relevant service required by the platform. In the main architecture of this module, we apply micro-service method to provide back-end service to the platform. Focused on function of the system, we design 5 micro-services based on our analysis, which are datasets organization micro-service, model training and converting micro-service, model training evaluation micro-service, model quantification and device implementation micro-service, real-time result detection micro-service. By applying this method, we can reduce the coupling between different micro-services while preserving the function and completeness of each micro-service. As it shown in Figure 1, we provided a visualization of how we divide the micro-service modules.

Figure 1.

Structure of micro service modules.

3.3

Server

As for cloud server, on the one hand, it can be used to finish relevant operation to datasets, which include batch rename the file, change the relative path of the xml file, and enhance the datasets based on the application scenarios. On the other hand, we construct the lightweight object detection environment on the cloud server, train detecting model in relevant environment, evaluate and convert the model we get.

The main function of the PC server is to perform as a communication platform between cloud server and embedded device and do some operation to the pixel data. Embedded device use SCOM to transmit the detecting result and real-time snapshot to the PC server. PC server translates those data into pictures and required results, and upload them to the platform for announcement.

3.4

Embedded device

By implementing the trained model, embedded device realizes lightweight object detection. The detected results and realtime image information are transformed through serial port to PC server. PC server interpreter the value of pixels it received into image. From the perspective of function, embedded device can be divided into 3 sections: video capturing and processing section, lightweight object detection reasoning section, and communication section. Video capturing and processing section uses camera to capture real-time video data. DVP camera interface module retransmits the address of those captured video data to AI module or RAM frame by frame. The main part and the key to the lightweight object detection reasoning section is knowledge processing unit (KPU). It has convolution, activation, pooling, and batch normalization operating unit. By quantifying, compressing, and finishing other operations based on rules and regulations, model trained by mainstream frames can be well supported by KPU. The performance of KPU should be enough for realtime object detection. Communication section use UART protocol to transmit detecting results and real-time image results. Also, we define our own communication protocol base on the demand of public transport application scenario.

In this project, we choose Kendryte K210 as the core operator of our embedded device. K210 has relatively low power consumption, 0.3W minimum power consumption and 1W power consumption with external devices connected. Also, it has KPU, which means it can have a much better performance when operate deep learning models compared to other similar processors. Also, it supports Linux operating system. Compared to other competitors, K210 cost less, which means it is a cost-effective choice. It only cost $3 to get a K210. Moreover, it provides necessary ports like UART, DVP and SPI.

From the circuit design of the embedded device, we divided the hardware into 4 modules. The first module is. storage module. The total size of the memory is 8M, which includes two parts: 6M universal SRAM and 2M AI SRAM for KPU to perform AI operation. It can be accessed through cache interference of the CPU, as well as not cached interface. This design reduces the distance of data transmission, increases the efficiency of it and reduce the power consumption of the chip. The second module is image capturing module. We apply gc0328 HD camera to connect the DVP on K210. Camera can send the data to SRAM or AISRAM through DVP. Then, the data in memory will be transmit to KPU or to the LCD screen to realize required functions. The third module is man-machine interaction module. An LCD screen and control buttons is the main components of this module. It can show the detection result and other relevant data and help to control the device. The fourth module is the communication. We apply UART ports to realize full duplex asynchronous communication. The embedded development platform which includes necessary peripherals. The development platform is shown in Figure 2.

Figure 2.

K210 development platform.

4. OVERALL APPLICATION SENARIO SYSTEM ARCHITECTURE

Considering our requirements, we decided to use YoloV510 as our training frame. Our aim is to classify vehicles into cars and buses, detect and recognize car plate. Since YoloV5 can detect many classes. We decided to use independent classes to detect car plates, 10 Arabic numbers, 26 English letters and abbreviations of 34 different province in China.

4.1

Dataset preparation

Before training the model, we need prepared our own datasets. Based on our requirement, we need to prepare 3 kinds of data. The first type is images of different cars and buses. In those images, the car plates of vehicles are covered by mosaic. Because we have our second type of data which includes car plate and characters on it. The third type of data is the image of signal characters, including 26 English letters, 10 Arabic numbers and abbreviations of part of Chinese provincial-level administrative region. The first and second types of data can be divided into two different parts. One is the original images, while other is images that generated from those original images through some image processing strategies like blurring, rotating, stretching, and filtering. This can help to improve the robustness of the model, making sure that model can perform normally under some extreme conditions such as foggy weather and dark environment.

Once the image was prepared, we used software to label them. We chose “labelling” to label the image. Since YoloV5 requires labels in txt format to operate, we need to transform the xml format label file generated by “labelling” into txt file. We design a program to realize this transformation. Then, another program was applied to change the filename of the label and image to make sure that all pairs have the same name. After doing this, the required dataset is well prepared.

4.2

Training model

We choose YoloV5 to train our model. This frame has great real-time performance in object detection. Also, YoloV5 is relatively small compared to other competitors, which means it suits the embedded device. We apply YoloV5 on our datasets and training our model using NAVIDA RTX3090. The result is acceptable and matches our needs.

5. RESULTS

Figure 3 indicates the precision curve during training. From the data we observe, the accuracy can reach to 97%. Three different training loss curves we got is shown in Figure 4. We can notice that the loss related to object detection is about 0.018, the loss related to bounding box is about 0.029, and the loss related to class is lower than 0.01. The loss curve we observed in valediction dataset is shown in Figure 5. The results shown that the loss related to object detection is about 0.021, the loss related to bounding box is about 0.005, and the loss related to class is about 0.009. The mAP curve we observed during training is shown in Figure 6. The mAP of the model is up to 0.958. We notice that there exists a decline after reaching its maximum. This is probably caused by the error in dataset and the range of dataset is not large enough. We use images which do not exist in our dataset to test the model, the result is shown in Figure 7. The result matches our expectation.

Figure 3.

Precision curve.

Figure 4.

Training loss.

Figure 5.

Valediction loss.

Figure 6.

mAP.

Figure 7.

Running result.

6. CONCLUSION

In this paper, we introduced a design of a system which can be used to classify buses and cars and detect and recognize their car license at the same time. This system can help to regularize the bus lane in city by recognizing the illegal occupation of bus lane and capture the car license of the illegal vehicles. Our system includes three parts, central service module, server, and embedded device. Each part plays their own role to make the system operate. To realize object detection, we use YoloV5 to train the model. We prepared our datasets to train the model. The result of training indicates that the model matches our expectation.

However, our model is limited. To start with, the dataset we used to train the model is limited. It only includes six province abbreviations in China. Chinese characters which may appear on car license can includes 34 province abbreviation, characters which marks special cars like cars from embassy and army, and other Chinese characters. If we want to implement our system, we need to enlarge our datasets. Also, since there may exist error in datasets, we need clean the data. Finally, if time permits, it is necessary to finish some field experiment by implementing the embedded device on buses to test our system.

REFERENCES

[1]

Method to implement “Laws of The People’s Republic of China on Road Traffic Safety” in Beijing, (2022) http://www.beijing.gov.cn/zhengce/zhengcefagui/202005/t20200518_1899681.html Google Scholar

[2]

Regulations on urban road traffic administration in Hangzhou, (2022) http://www.hangzhou.gov.cn/art/2019/8/12/art_1674220_6290.html Google Scholar

[3]

Regulations on traffic administration in Shanghai, (2022) https://law.sfj.sh.gov.cn/#/detail?id=6ff30b8f3a7752629c01941728a3a273 Google Scholar

[4]

Eastman, C., Hewitt, R. and Slinn, M., “Using cameras to deter the illegal use of bus lanes in Birmingham,” IEE Colloquium on Camera Enforcement of Traffic Regulations, 31 –35 (1996). Google Scholar

[5]

Li, J. H., Tian, Y. N., and Xu, X. J., “Design and implementation of bus lane video image monitor system based on FPGA,” in 29th Chinese Control and Decision Conf. (CCDC), 4868 –4872 (2017). Google Scholar

[6]

Ketcham, M., Getkhaw, E., Piyaneeranart, M., Ganokratanaa, T., and Yimyam, “Recognizing the illegal parking patterns of cars on the road in front of the bus stop using the support vector machine,” in 15th Int. Conf. on Signal-Image Technology & Internet-Based Systems (SITIS), 538 –542 (2019). Google Scholar

[7]

Yonetsu, S., Iwamoto, Y., and Chen, Y., “Two-stage YOLOv2 for accurate license-plate detection in complex scenes,” in IEEE Int. Conf. on Consumer Electronics (ICCE), 1 –4 (2019). Google Scholar

[8]

Liu, Y. Z., Li, Y. F., Chen, G. and Gao, H. L., “An edge-end based fast car license plate recognition method,” in Int. Conf. on Big Data & Artificial Intelligence & Software Engineering (ICBASE), 394 –397 (2020). Google Scholar

[9]

Mo, S. Q., Liu, Z. G., Zhang, J. and Wu, C., “Real-time vehicle classification method for multi-lanes roads,” in 4th IEEE Conf. on Industrial Electronics and Applications, 960 –964 (2009). Google Scholar

[10]

. GitHub-ultralytics/yolov5: YOLOv5 in PyTorch > ONNX > CoreML > TFLite 2022 GitHub, https://github.com/ultralytics/yolov5 Google Scholar

Citation Download Citation

Ziyao Meng and Zhigang Wen "A smart bus lane monitoring system based on object detection", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125065N (28 December 2022); https://doi.org/10.1117/12.2662596

Access the abstract

PROCEEDINGS
6 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Data modeling

Cameras

Clouds

Instrument modeling

Visual process modeling

Video

Computer vision technology

1.

INTRODUCTION

2.

PROBLEMS

3.

OVERALL APPLICATION SENARIO SYSTEM ARCHITECTURE

3.1