|
1.INTRODUCTIONAccurate and reliable vehicle trajectory data can provide information such as vehicle position, vehicle speed, acceleration, queue length and driving time, which is of great significance for the development of intelligent transportation system and urban transportation system management. In previous studies, vehicle trajectory data has been widely used in traffic state estimation, traffic flow modeling, signal optimization, vehicle fuel consumption and emissions [1][2]. At present, two types of traffic sensors are widely used to collect vehicle trajectory information, namely fixed and mobile sensors. Fixed sensors have the advantages of simple operation, ensure consistency of detection target and monitoring environment, and can provide efficient target position and speed under various conditions. Perhaps the best-known vehicle trajectory dataset is the Next Generation Simulation Program (NGSIM) dataset [3], which is also widely used in vehicle trajectory reconstruction, vehicle lane change, and performance evaluation of traffic flow operation, etc. [4][5][6][7][10][11][12][13][14]. However, the error of trajectory will affect the accuracy of driving behaviour evaluation [15]. The erroneous position data will affect the estimation of kinematic quantities, and the measurement errors will also have adverse effects on the calibration of the car-following models [8][17]. As Thiemann et al [8] have already pointed out, the distribution of acceleration investigated in NGSIM datasets is unrealistic. Coifman and Li [17] critically evaluated vehicle trajectory video data from NGSIM video. They found that the magnitude of the errors depends on vehicle speed, location, and length. According to the study conducted by Ossen and Hoogendoorn [16], the influence of noise of trajectory data on the estimated parameters of car-following models is non-negligible. It is important for experimentalists to remove data noise as much as possible without compromising data quality [18]. Yet the quality of trajectory data and its impact on the reliability of related studies were a vastly underestimated problem in the traffic literature even before the availability of NGSIM data [9]. In recent years, the research on optimizing trajectory data quality mainly focuses on noise (outlier) recognition and vehicle trajectory reconstruction. Fitters et al [19] propose a novel framework for traffic flow prediction, OE-LSTM (Outlier-Enriched LSTM), to identify outliers that deviate from regular traffic flow. Fard et al [20] proposed a two-step technique based on wavelet analysis, primarily by using a wavelet transform with unique processing methods to identify and modify outliers. Montanino and Punzo [4] presented a multistep filtering method, which aimed to eliminate outliers leading to nonphysical acceleration values by local reconstruction of the vehicle trajectory. Wei et al [21] adopted a particle filter (PF)-based approach to reconstruct the missing trajectory between the continuous updates of vehicles on the main road of single and multiple intersections. Although many studies have made great contributions to the noise elimination and reconstruction of vehicle trajectory data, few studies have combined the two processes of outlier recognition and vehicle trajectory reconstruction into a unified system, most studies focus on just one aspect. This paper proposes a two-step data reconstruction strategy to detect and reconstruct the target data set. The first step: local outlier identify algorithm based on k-means clustering is used to perform cluster analysis on the target vehicle trajectory data set and then identify the outliers in the data. The second step: using Savitzky-Golay filter to smooth and de-noise the data stream which containing outliers, it can not only improve the smoothness of the target data, but also reduce the interference of noise to the trajectory data. 2.TWO-STEP VEHICLE TRAJECTORY DATA RECONSTRUCTION STRATEGYThe main objective of the two-step vehicle trajectory data reconstruction strategy proposed in this paper is to effectively identify the outliers of the target data set, select a suitable filter processor to smooth the data set which containing noise, and ensure that the data is not distorted in the process of noise processing. 2.1Vehicle trajectory data outlier recognition based on the K-CBLOF algorithm(1) Definition of cluster-based local outlier. The cluster-based local outlier factor (CBLOF) detection method, is a cluster-based concept on the idea for local outlier factor detection. The basic idea of CBLOF is to cluster the data first, and then divide the data into large clusters and small clusters, with a small number of outliers, which tend to have a large deviation from most normal data. Therefore, it is necessary to calculate the distance between the data and the large clusters to measure the abnormal degree of the data. The larger the distance, the more abnormal the data. This paper is based on the CBLOF proposed by He et al [22], which is, the target data set is divided into different clusters, the data belonging to small clusters is defined as outliers, and the cluster-based outlier should satisfy that they are “local” to specified clusters. In order to identify the physical significance of outliers, an outlier factor named cluster-based local outlier factor was assigned to each object, which is measured by both the size of the cluster the object belongs to and the distance between the object and its closest cluster (assume the object lies in a small cluster). Then, the concept of cluster is introduced. Define |S| as the size of S, which usually refers to a set containing some element
(2)K-means clustering algorithm. In this paper, the clustering algorithm used is the k-means algorithm [24], which can produce good clustering results and has good stability. Then have an introduction on k-means clustering algorithm. The process of k-means is shown in Figure 1. Suppose the target object is x, xi, which is indicate the average of cluster Ci, the sum of the squared error of all objects is used Euclidean distance E, and the criterion function is define as follows: The Euclidean distance between two vectors x = (x1,…,xn) and y = (y1,…,yn), the Euclidean distance can be obtained as follows: In the k-means algorithm, the distance between the target and the cluster center are determined by executing several loops. When the cluster center is no longer updated, the k-means algorithm converges. The exact value of the iteration number t of k-means algorithm depends on the starting center of the initial cluster, and the distribution of data points in the target data set has a certain relationship with the new cluster center. n is the number of all data objects, k is the number of clusters, t is the iterations of algorithm. Usually requiring k ≪ n and t ≪ n. The algorithm K-CBLOF for detecting outliers is listed in Figure 2. 2.2Vehicle trajectory data smoothing based on Savitzky-Golay filterSavitzky-Golay filter [18] [23], originally proposed by SavitzkyA and GolayM in 1964, is widely used in smoothing data flows by noise removal. It is a method of best fitting in the time domain based on polynomials and using the least square method by moving windows. This method is relatively simple, fast, and preserves the relative maximum, minimum, and width distribution characteristics better than other similar averaging methods. Set a set of data as u(i), and the value of i is 2m + 1 consecutive integer value, that is, i = −m, …,0, …, m. Construct an n-order polynomial (n ≤ 2m + 1) to fit this set of data. The error of squares of points is: To minimize M, the derivative of M with respect to each coefficient can be 0, Given the number of unilateral points m to be fitted, the order n of the polynomial and the data u(−m), …,u(0), …, u(m), F can be solved. By substituting Sk+r into , the coefficient hn0, hn1, …, hnn can be found, and the polynomial fi can be determined. When k + r is odd, So only when the k + r is an even number, , r = 0,1,2,…, n is existing. When both n and s are even or odd: In practical, it is often not necessary to calculate all the coefficients hn0,hn1,…, hnn, then can be obtained: 3.CASE STUDYResearchers for the NGSIM program collected detailed vehicle trajectory data on eastbound I-80 in the San Francisco Bay area in Emeryville, CA, on April 13, 2005. As shown in Figure 3, the study area was approximately 500 meters (1,640 feet) in length and consisted of six freeway lanes, including a high-occupancy vehicle (HOV) lane. An onramp also was located within the study area. Seven synchronized digital video cameras, mounted from the top of a 30-story building adjacent to the freeway, recorded vehicles passing through the study area. This vehicle trajectory data provided the precise location of each vehicle within the study area every one-tenth of a second, resulting in detailed lane positions and locations relative to other vehicles. A total of 45 minutes of data are available in the full dataset, segmented into three 15-minute periods: 4:00 p.m. to 4:15 p.m.; 5:00 p.m. to 5:15 p.m.; and 5:15 p.m. to 5:30 p.m. These periods represent the buildup of congestion, or the transition between uncongested and congested conditions, and full congestion during the peak period. In order to observe the error existing in the original data more clearly, the data from 4:00 to 4:15 was chosen to verify the strategy proposed in this paper. In the prototype data set, two-thirds of all accelerations are beyond ±3m/s2. The example trajectory shows that the driver is allegedly changing between hard acceleration and hard deceleration several times a second, which is unrealistic [6]. Table 1 shows the field meanings of NGSIM trajectory data set. Table 1.Parameter naming and definition
3.1Vehicle trajectory data set outlier detectionThe CBLOF algorithm of PyOD in Python was used to identify outliers in the I-80 data set from 4:00-4:15 p.m. With K-Means clustering, α is assigned a value of 0.9 means that the number and proportion of the data of the large cluster is more than 90%, and β is assigned a value of 5, which indicates that the data flow of the former cluster is 5 times the number of the latter cluster. When the data point belongs to a large cluster, its distance from the cluster center of the current cluster is calculated. When the data point belongs to a small cluster, its distance from the cluster center of the nearest large cluster is calculated. By sorting the outlier factor from largest to smallest, outliers can be picked out. The analysis results of vehicle speed and acceleration are shown in Figure 4. Through analysis, it can be seen that there are obvious mutations in the speed and acceleration of the vehicle, which is unreasonable in real life, which is consistent with the previous research. Compared with the speed, the abrupt value of vehicle acceleration is relatively large, which is also the main goal of data smoothing in this paper. 3.2Vehicle trajectory data smoothingIn this paper, the Savitzky-Golay filter is used to smooth the vehicle speed and acceleration. There are three main approaches that can be used to smooth the dataset, which are:(ⅰ) Smooth all four variables i.e. X, Y, velocities and accelerations; (ⅱ) Smooth X, and Y, then differentiate velocities and acceleration; (ⅲ) Smooth X and Y, then differentiate velocities, smooth velocities, differentiate acceleration, and finally smooth acceleration. This paper relies on the second approach and apply the filter on the Local_X and Local_Y and not the Global_X and Global_Y values since they are based on the California state plane coordinate system. The width of the filter window is n = 2m + 1, this paper use separately a window of 11 and a window of 21, because they represent, respectively, 10 and 20 data points excluding the target data point. In other words, we use the trajectory data of 1 seconds and 2 seconds to smooth the data. Both datasets can be found in the folder window-11 and window-21 respectively. Taking vehicle 1 as an example, the smooth results of speed and acceleration data are shown in Figure 5. As can be seen from Figure 5, through the analysis of vehicle speed and acceleration, vehicle acceleration noise in the original data is more obvious, which is consistent with the above analysis results. By using SG filtering to smooth the speed and acceleration data, it is obvious that the smoothed data has greatly improved compared with the prototype data. In Figure 3, the black line represents the prototype data, and the red line and the blue line represent the results after data smoothing when the window length is 11 and 21 respectively. The analysis results show that when the filter window length is 21, the data quality will have a great improvement after smoothing. That is, when the window length is larger, the SG filter has a more significant effect on eliminating data noise. 4.CONCLUSIONThis study points out the root mechanism of errors in NGSIM data, a widely used prototype dataset. It turns out that some of the data are seriously affected by errors in vehicle spatial coordinate measurement, which are further amplified during the differential calculation of velocity and acceleration values. Based on previous studies, it can be seen that the main cause of measurement errors in velocity and acceleration values is the poor positioning of points in space. Therefore, a two-step vehicle trajectory data reconstruction strategy is proposed in this paper to eliminate outliers in vehicle velocity and acceleration and remove residual random interference through local reconstruction of vehicle trajectory. Firstly, CBLOF algorithm based on k-means clustering is used to identify the outliers of vehicle speed and acceleration. This step is mainly to divide the data into large clusters and small clusters, and determine the outlier value by calculating the distance between the target data and the large cluster. Then, the data with noise is smoothed locally based on SG filtering, and the vehicle speed and acceleration are smoothed at Local_X and Local_Y using Windows of length 11 and 21 respectively, and compared with the prototype data. Through the analysis, it can be seen that the two-step vehicle trajectory data reconstruction strategy proposed in this paper can effectively identify the outliers existing in the data set, and the data quality after SG filtering and smoothing is also significantly improved. In addition, in the case of this paper, the larger the filtering window, the better the data smoothing effect. In summary, the strategy proposed in this paper can effectively eliminate outliers and reconstruct data without losing the truth. REFERENCESGuo, Q., Li, L., Ban, X.J.,
“Urban traffic signal control with connected and automated vehicles: a survey,”
Transp. Res. Part C Emerg. Technol., 101 313
–334
(2019). https://doi.org/10.1016/j.trc.2019.01.026 Google Scholar
Li, L., Jiang, R., He, Z., Chen, X.M., Zhou, X.,
“Trajectory data-based traffic flow studies: a revisit,”
Transp. Res. Part C Emerg. Technol., 114 225
–240
(2020). https://doi.org/10.1016/j.trc.2020.02.016 Google Scholar
. (NGSIM, Next Generation Simulation (NGSIM) Vehicle Trajectories and Supporting Data [Online],”
(2016) https://doi.org/https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Vehicle-Trajector/8ect-6jqj Google Scholar
Montanino, M., Punzo, V.,
“Making NGSIM data usable for studies on traffic flow theory,”
Transp. Res. Rec., 2390
(1), 99
–111
(2013). https://doi.org/10.3141/2390-11 Google Scholar
Altche, F., de La Fortelle, A.,
“An LSTM Network for Highway Trajectory Prediction,”
in In: Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).,
353
–359
(2017). https://doi.org/10.1109/itsc.2017.8317913 Google Scholar
Thiemann, C., Treiber, M., Kesting, A.,
“Estimating acceleration and lane-changing dynamics from next generation simulation trajectory data,”
Transp. Res. Rec., 2088
(1), 90
–101
(2008). https://doi.org/10.3141/2088-10 Google Scholar
Lei, C., Zhao, C., Ji, Y., Shen, Y., Du, Y.,
“Identifying and correcting the errors of vehicle trajectories from roadside millimetre-wave radars,”
IET Intell. Transp. Syst., 17 418
–434
(2023). https://doi.org/10.1049/itr2.v17.2 Google Scholar
Chen, X., Yin, J., Qin, G., Tang, K., Wang, Y., Sun, J.,
“Integrated macro-micro modelling for individual vehicle trajectory reconstruction using fixed and mobile sensor data,”
Transp. Res. Part C Emerg. Technol., 145 103929
(2022). https://doi.org/10.1016/j.trc.2022.103929 Google Scholar
Punzo, V., Borzacchiello, M. T., Ciuffo, B.,
“On the assessment of vehicle trajectory data accuracy and application to the Next Generation SIMulation (NGSIM) program data,”
Transp. Res. Part C Emerg. Technol., 19
(6), 1243
–1262
(2011). https://doi.org/10.1016/j.trc.2010.12.007 Google Scholar
Yang, L., Fang S., Wu, G. Y., Sheng, H., Xu, Z. G., Zhang, M. X., Zhao, X. M.,
“Physical model versus artificial neural network (ann) model: a comparative study on modeling car-following behavior at signalized intersections,”
J. Adv. Transp.,
(3), 1
–18
(2022). https://doi.org/10.1155/2022/8482846 Google Scholar
X. Hu, Z. Zheng, D. Chen, X. Zhang, and J. Sun,
“Processing, assessing, and enhancing the waymo autonomous vehicle open dataset for driving behavior research,”
Transp. Res. Part C Emerg. Technol., 134
(103490), 1
(2022). https://doi.org/10.1016/j.trc.2021.103490 Google Scholar
Montanino, M., Punzo, V.,
“Trajectory data reconstruction and simulation-based validation against macroscopic traffic patterns,”
Transp. Res. Part B Methodol., 80 82
–106.
(2015). https://doi.org/10.1016/j.trb.2015.06.010 Google Scholar
Chen, P., Wang, T., Zheng, N.,
“Reconstructing vehicle trajectories on freeways based on motion detection data of connected and automated vehicles,”
J. Intell. Transp. Sys., 26
(6), 1
–16
(2021). https://doi.org/10.1080/15472450.2021.1955211 Google Scholar
Venthuruthiyil, S. P., Chunchu, M.,
“Trajectory reconstruction using locally weighted regression: a new methodology to identify the optimum window size and polynomial order,”
Transportmetrica A Transp. Sci., 1
–20
(2018). https://doi.org/10.1080/23249935.2018.1449032 Google Scholar
van Beinum, A.; Farah, H.; Wegman, F.; Hoogendoorn, S.,
“Driving behaviour at motorway ramps and weaving segments based on empirical trajectory data,”
Transp. Res. Part C Emerg. Technol., 92 426
–441
(2018). https://doi.org/10.1016/j.trc.2018.05.018 Google Scholar
Ossen, S., Hoogendoorn, S.P.,
“Validity of trajectory-based calibration approach of car-following models in presence of measurement errors,”
Transp. Res. Rec.,
(2088), 117
–125
(2008). https://doi.org/10.3141/2088-13 Google Scholar
Coifman, B., Li, L.,
“A critical evaluation of the Next Generation Simulation (NGSIM) vehicle trajectory dataset,”
Transp. Res. Part B Methodol., 105 362
–377
(2017). https://doi.org/10.1016/j.trb.2017.09.018 Google Scholar
Savitzky, Abraham, Golay, M.J.E.,
“Smoothing and differentiation of data by simplified least squares procedures,”
Anal. Chem., 36 1627
–1639
(1964). https://doi.org/10.1021/ac60214a047 Google Scholar
Fitters, W, Cuzzocrea, A., Hassani, M.,
“Enhancing LSTM Prediction of Vehicle Traffic Flow Data via Outlier Correlations,”
in Proceedings of the 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC),
210
–217
(2021). https://doi.org/10.1109/COMPSAC51774.2021.00039 Google Scholar
Fard, M. R., Mohaymany, A. Shariat., Shahri. M.,
“A new methodology for vehicle trajectory reconstruction based on wavelet analysis,”
Transp. Res. Part C Emerg. Technol., 74 150
–167
(2017). https://doi.org/10.1016/j.trc.2016.11.010 Google Scholar
Wei, L., Wang, Y., Chen, P.,
“A particle filter-based approach for vehicle trajectory reconstruction using sparse probe data,”
IEEE Trans. Intell. Transp. Syst., 22
(5), 2878
–2890
(2020). https://doi.org/10.1109/TITS.2020.2976671 Google Scholar
He Z, Xu X, Deng S.,
“Discovering cluster-based local outliers,”
Pattern Recogn Lett., 24
(9-10), 1641
–1650
(2003). https://doi.org/10.1016/S0167-8655(03)00003-5 Google Scholar
Shi, N., Liu, X. M., Guan, Y.,
“Research on K-means Clustering Algorithm: An Improved K-means Clustering Algorithm,”
in In: Third International Symposium on Intelligent Information Technology and Security Informatics, IITSI 2010,
63
–67
(2010). https://doi.org/10.1109/IITSI.2010.74 Google Scholar
|