Open Access Paper
28 December 2022 A bagging ensemble learning traffic demand prediction model based on improved LSTM and transformer
Tianhuai Wang, Yipeng Li, Wenbing Chang, Shenghan Zhou
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125061Y (2022) https://doi.org/10.1117/12.2662894
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
The paper aims to build a traffic prediction model for online car-hailing demand with Improved LSTM and Transformer. There are many factors that affect demand, such as temporal features, spatial features, high signal-to-noise ratio, and so on. In this study, LSTM and Transformer are used to extract the temporal and spatial features of data. The temporal and spatial features are used for bagging ensemble learning to predict the online car-hailing orders. An improved LSTM suitable for Traffic data sets is proposed in this paper. In LSTM, we use wavelet decomposition and reconstruction to extract the small-amplitude high-frequency signal from data. The extracted signal is translated and superimposed into the results of the model. The average MAE of 21.24 is obtained on Online Car-hailing Information Data Sets. Compared with other methods, the results suggest the proposed traffic demand prediction model has better accuracy.

1.

INTRODUCTION

In the 21st century, with the rise of mobile Internet, online car-hailing has become an important part of the smart city. Due to weather, holidays, geographical location, and other factors, orders in different regions are difficult to predict in the business scenario of online car-hailing. At present, traffic demand prediction based on big data analysis has become the main research direction. With the help of big data analysis, features from the data are extracted by machine learning to build the traffic demand prediction model.

Time series prediction methods are used by the traffic demand prediction model, which includes linear models and nonlinear models. Linear models, such as ARIMA, ARIMAX, and so on, are widely used. The linear time series is well fitted by linear models. However, Traffic demand prediction models are usually nonlinear time series. And linear models are ineffective. Therefore, many nonlinear models based on machine learning, such as RNN, LSTM, and so on, are proposed in recent years. Shahid et al.1 proved that LSTM has better performance as a prediction model. Song et al.2 divided LSTM into two subnetworks: temporal attention subnetwork and spatial attention subnetwork. Hu et al.3 proposed AB-ConvLSTM to predict traffic speed. The proposed model is based on LSTM, which includes two Bi-LSTM, a Convolutional LSTM, and an attention mechanism. Tian et al.4 used multiscale time smoothing mothed for missing data inference. And the residual is learned based on LSTM. Zhang et al.5 proposed LSTM-XGBoost. It combined LSTM and XGBoost for road short-term traffic flow prediction. Dogan et al.6 proved that enlarging the size of data sets increases the performance of LSTM in traffic flow prediction. And a clustering model is proposed to adjust the size of the data sets. Ma et al.7 proposed a Convolutional LSTM to predict short-term traffic flow. Mou et al.8 proposed an improved LSTM based on temporal information enhancement for traffic flow prediction. Luo et al.9 proposed a traffic flow prediction model that combined LSTM and KNN for spatiotemporal prediction. In the proposed model, KNN extracts spatial features of traffic flow and selects neighboring stations. Liu et al.10 proposed a novel ocean-temperature prediction model. In the proposed model, the matrix is fused based on historical observation data. Wang et al.11 proposed a novel model, including a seven-layer Bi-LSTM and an output layer for respiration motion prediction. Chen et al.12 proposed an improved LSTM for rainfall-runoff prediction. Trend features of time series are extracted and the nonlinear mapping relationship is identified. Zheng et al.13 select the important influencing factors by grey correlation analysis. LSTM is improved by a linear structure for temperature prediction. Chen et al.14 proved an improved LSTM to predict the deformation mirror voltages.

Multiple features in the data of Online Car-hailing Information Data Sets include temporal and spatial features. LSTM and Transformer extract the temporal and spatial features of data respectively. The spatial features of the data include weather, temperature, PM2.5, traffic jam information, POI value, holidays, and so on. The temporal and spatial features are used for Ensemble Learning for the online car-hailing orders prediction. In LSTM, the high-frequency signal in the data will affect the feature extraction of the model. We use wavelet decomposition and reconstruction to extract the small-amplitude high-frequency signal in the data, which is similar to the random Gaussian signal. It is not necessary to predict the random Gaussian signal, but only to translate and superimpose the signal. The method of predicting the low-frequency signal and superimposing the high-frequency signal overcomes the problem of low signal-to-noise ratio in data sets. The proposed method is verified in Online Car-hailing Information Data Sets. And the results suggest the proposed prediction model has better performance.

2.

ONLINE CAR-HAILING INFORMATION DATA SETS AND DATA PROCESSING

2.1

Data overview

Traffic data sets used in this study are named Online Car-hailing Information Data Sets, which are from the DIDI Taxi GAIA Initiative. Online car-hailing information of a city in January 2016 is recorded by Traffic data sets. Online Car-hailing Information Data Sets include Area Data Sets, Order Data Sets, Weather Data Sets, Traffic Jam Information Data Sets, and POI Data Sets. Area Data Sets contain the area list and area hash value. Order Data Sets records the order data of online car-hailing. Traffic demand is affected by other factors. The weather is recorded in Weather Data Sets. The traffic jam information is recorded in Traffic Jam Information Data Sets. POI value represents the attributes of area, and POI value is recorded in POI Data Sets. Online Car-hailing Information Data Sets contain about 8.5 million data records, and the information contained in each data set is shown in Table 1.

Table 1.

The information contained in each data set.

Data setsInformation
Area data setsArea hash value, Area ID
Order data setsOrder ID, passenger ID, driver ID, order price, order start area hash value, order destination area hash value, time
Weather data setsTime, weather information, temperature, PM2.5
Traffic jam information data setsArea hash value, traffic jam information, time
POI data setsArea hash value, POI value

2.2

Data cleaning

We analyzed the data sets and eliminated invalid data and unreasonable data. For example, some Area hash values on Order Data Sets do not have corresponding area IDs.

2.3

Data preprocessing

For data analysis, the data is normalized by the min-max method as:

00090_PSISDG12506_125061Y_page_2_1.jpg

where xi′ is the normalized data, xi is the original data in the data sets, min(*) is a function, which returns the minimum value of the input value, and max (*) is a function, which returns the maximum value of the input value.

In Weather Data Sets, Traffic Jam Information Data Sets, and POI Data Sets, multiple data exist at the same time or in the same area. We take the average of multiple data as:

00090_PSISDG12506_125061Y_page_2_2.jpg

where x1, x2,⋯, xn is multiple data at the same time or in the same area, n is the size of multiple data, and x′ is the average value of multiple data. In Order Data Sets, the order whose driver ID is NULL represents a gap order. We count the size of gap orders at each time, which needs to be predicted. If some data is NULL, we use the average of the values near it.

2.4

Data augmentation

183032 groups of data can be obtained after data cleaning and data preprocessing. To enlarge features of Traffic data sets and improve the generalization ability of the proposed traffic demand prediction model, data augmentation is used. A group of data includes gap, weather information, temperature, PM2.5, traffic jam information, POI value, and holiday. MixUp method is used, and coefficient α and coefficient β of beta function are respectively taken as:

00090_PSISDG12506_125061Y_page_3_1.jpg

After data augmentation, 3666064 groups of data can be obtained.

3.

IMPROVED LSTM BASED ON WAVELET ANALYSIS

In the traffic prediction model, the high-frequency signal in the data affects the extraction of data features by LSTM. Wavelet analysis can effectively extract the high-frequency signal in the data and separate the signals with different frequencies. Therefore, we use wavelet analysis to decompose and reconstruct the signal. And the paper proposes a new traffic prediction model based on LSTM.

3.1

LSTM

LSTM14 is used for time series prediction, which effectively solves the long-term dependencies problem of RNN. LSTM includes an output gate, an input gate, and a forget gate. LSTM specific structure is shown in Figure 1.

Figure 1.

LSTM specific structure.

00090_PSISDG12506_125061Y_page_3_2.jpg

where ht is the output value at time t, Xt is the input value at time t, Ct is the state value at time t, sigmoid() represents the feedforward neural network with a sigmoid activation function, tanh() represents the feedforward neural network with a tanh activation function.

3.2

Wavelet decomposition and reconstruction

Wavelet analysis15 is used to effectively extract important information from data. Through the operations of stretching and translation, the signal can be analysed in multi-scale detail, and then the detailed features of the signal can be focused. In recent years, more and more people begin to use wavelet analysis to process time series.

The time series is decomposed by wavelet decomposition, which is divided into two parts: the low-frequency coefficient and the high-frequency coefficient. The wavelet decomposition of time series is as follows:

00090_PSISDG12506_125061Y_page_3_3.jpg

where h1 is the high-pass decomposition filter, h0 is the low-pass decomposition filter, zt is the high-frequency coefficient, yt is the low-frequency coefficient, and * is the convolution operator.

Time series can be reconstructed by wavelet reconstruction. Wavelet reconstruction can calculate the low-frequency component from the low-frequency coefficients and calculate the high-frequency component from the high-frequency coefficients. The wavelet reconstruction of time series is as follows:

00090_PSISDG12506_125061Y_page_4_1.jpg

where g1 is the high-pass reconstruction filter, g0 is the low-pass reconstruction filter, Zt is the high-frequency signal, and Yt is the low-frequency signal.

Using wavelet decomposition and reconstruction, the relationship between the low-frequency signal Yt, the high-frequency signal Zt and time series Xt is as follows:

00090_PSISDG12506_125061Y_page_4_2.jpg

3.3

Improved LSTM

The paper improves LSTM by wavelet decomposition and reconstruction. The high-frequency signal in the data sets is extracted by wavelet decomposition and reconstruction. Db10 wavelet base and soft threshold function are used, and the number of wavelet layers is 3. The process of wavelet decomposition and reconstruction is shown in Figure 2.

Figure 2.

The process of wavelet decomposition and reconstruction.

00090_PSISDG12506_125061Y_page_4_3.jpg

After three-layer wavelet decomposition and reconstruction, the high-frequency signal Z3 is obtained. The soft threshold function is used to filter the high-frequency signal D3. The soft threshold function is as follows:

00090_PSISDG12506_125061Y_page_4_4.jpg

where λ is a threshold, 00090_PSISDG12506_125061Y_page_4_5.jpg is estimated wavelet coefficients, Wj,k is wavelet coefficients after decomposition, T is taken between 0 and 1, and sgn(*) is the symbolic function.

Part of the data is selected as an example. Wavelet decomposition and reconstruction divides the signal into two parts: the low-frequency signal and the high-frequency signal. We believe that the small-amplitude high-frequency signal should not be focused on by the prediction model. Therefore, it is filtered to prevent it from affecting temporal features extracted by LSTM. The time series and the low-frequency signal are shown in Figures 3a and 3b respectively.

Figure 3.

Time series and the low-frequency signal.

00090_PSISDG12506_125061Y_page_5_1.jpg

The filtered small-amplitude high-frequency signal is shown in Figure 4.

Figure 4.

The filtered small-amplitude high-frequency signal.

00090_PSISDG12506_125061Y_page_5_2.jpg

From Figure 4, the small-amplitude high-frequency signal is similar to the Gaussian signal. There are a few valuable features to extract. And it will affect the extraction of temporal features in signals. In contrast, the low-frequency signal contains almost all the features of the data and directly determines the trend of the predicted signal. Therefore, LSTM should focus on the low-frequency signal, and only need to translate the high-frequency signal and superimpose them on the prediction results. The flow chart of the proposed Improved LSTM is shown in Figure 5.

Figure 5.

The flow chart of the proposed Improved LSTM.

00090_PSISDG12506_125061Y_page_6_1.jpg

The steps of Improved LSTM are as follows:

Step 1: The time series is decomposed and reconstructed by wavelet as:

00090_PSISDG12506_125061Y_page_6_2.jpg

where Xinput is the time series, which is input into the model, Yinput and Zinput are the low-frequency signal and the high-frequency signal respectively.

Step 2: LSTM is used to predict the low-frequency signal as:

00090_PSISDG12506_125061Y_page_6_3.jpg

where f (*) is LSTM, n is the length of the prediction signal, and Youtput is the prediction signal output by LSTM.

Step 3: The high-frequency signal is translated as:

00090_PSISDG12506_125061Y_page_6_4.jpg

Step 4: The two signals obtained in the previous two steps are superimposed as:

00090_PSISDG12506_125061Y_page_6_5.jpg

where Xoutput is the prediction series.

The pseudo-code of Improved LSTM is shown in Table 2.

Table 2.

The pseudo-code of Improved LSTM.

4.

BAGGING ENSEMBLE LEARNING

Traffic data sets have significant temporal and spatial features. To make the best use of features, the paper proposes a traffic demand prediction method based on Bagging Ensemble Learning. Improved LSTM proposed above is used in the method.

The time series in the data is used, and the time series prediction model is proposed as:

00090_PSISDG12506_125061Y_page_7_1.jpg

where xt is the gap at time t, and f (*) is the time series prediction model, proposed Improved LSTM can be used. The temporal features of the data are fully extracted by LSTM.

The features of other dimensions in the data are used, and the regression model is proposed as:

00090_PSISDG12506_125061Y_page_7_2.jpg

where xt is the gap at time t, and p1, p2,⋯, p6 are weather information, temperature, PM2.5, traffic jam information, POI value, and holiday, h(*) is the regression model, Transformer can be used. The spatial features of the data are fully extracted by Transformer. Online car-hailing demand is related to these factors, and their impact should be considered. Transformer16 consists of N self-attention modules, and each module includes a multi-head self-attention model and a feedforward network layer. multi-head self-attention model as:

00090_PSISDG12506_125061Y_page_7_3.jpg

where Q is a query value, {ki, vi}M is the set of a key-value pair, attention(*) is the multi-head self-attention model, and V is the output.

The above two models have limitations and are difficult to analyze all the information of the data sets. To use information as much as possible, we establish the time series prediction model and the regression model respectively. The two models extract all the features of the data. In order to obtain their advantages and better results, the two models are trained through Bagging Ensemble Learning as:

00090_PSISDG12506_125061Y_page_7_4.jpg

where λ is the coefficient, which determines the proportion of the two models, XLSTM(t) is the result of the time series prediction model, XTransformer(t) is the result of the regression model, and X(t) is the result of Bagging Ensemble Learning. The flow chart of the proposed traffic prediction demand model is shown in Figure 6.

Figure 6.

The flow chart of the proposed traffic demand prediction model.

00090_PSISDG12506_125061Y_page_7_5.jpg

5.

EXPERIMENT AND RESULTS

To verify the effectiveness of the proposed traffic demand prediction model, Traffic data sets are verified by experiments. The proposed method is used in Online Car-hailing Information Data Sets. The training of the model has achieved good results. The parameters are set for the Improved LSTM and Transformer are shown in Table 3.

Table 3.

The parameters are set for the improved LSTM and Transformer.

 LSTMTransformer
Epochs30100
Batch size1616
Loss functionMAEMAE
OptimizerAdamAdam
Learning rate0.0030.0002

The results of the two models are used for Bagging Ensemble Learning as:

00090_PSISDG12506_125061Y_page_7_6.jpg

The results of the validation set are shown in Figure 6a. The correlation between variables is shown in Figure 6b.

Figure 6.

The results of the validation set and the correlation between variables.

00090_PSISDG12506_125061Y_page_8_1.jpg

where 1, 2, 3, 4, 5, and 6 in Figure 6b represent weather information, temperature, PM2.5, traffic jam information, POI value, and holiday respectively, and 7 in Figure 6b represents gap. From Figure 6b, gap has a strong relationship with PM2.5 and POI value. More attention should be paid to these variables in data analysis.

To verify the effectiveness of the proposed method, we experiment with Online Car-hailing Information Data Sets. A variety of time series prediction models are selected for comparison, including ARIMA, ARIMAX, RNN, and LSTM. MSE is used as the evaluation standard. The verification results on Online Car-hailing Information Data Sets are shown in Table 4.

Table 4.

The verification results on Online Car-hailing Information Data Sets.

 Area 1Area 2Area 3Area 65Area 66Average
ARIMA28,9424.3925.4228.7826.4327.45
ARIMAX26.5322.8923.2123.3126.3224.51
RNN23.3424.5234.9632.3225.3123.52
LSTM20.4321.1321.2322.3224.5322.43
The proposed method19.3122.3420.4221.3122.1321.24

From Table 4, the proposed method achieves better results in most situations.

6.

CONCLUSIONS

In this study, a novel traffic demand prediction model based on improved LSTM and Transformer is proposed. Compared with the other research, this study provides the following innovations:

  • Multiple features of data are used, and different models are proposed to extract temporal and spatial features. Bagging Ensemble Learning is used to predict online car-hailing orders.

  • An Improved LSTM based on wavelet decomposition and reconstruction is proposed, which is suitable for data sets with the high-frequency signal.

In summary, this study analyzes the temporal and spatial features of Traffic data sets and proposed a novel traffic demand prediction model. The proposed model has better performance.

ACKNOWLEDGMENTS

This research was funded by the National Natural Science Foundation of China (Grant No.71971013) and the Fundamental Research Funds for the Central Universities (YWF-22-L-943). The study was also sponsored by the Graduate Student Education & Development Foundation of Beihang University.

REFERENCES

[1] 

Shahid, F., Zameer, A. and Muneeb, M., “Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM,” Chaos, Solitons & Fractals, 140 110212 (2020). https://doi.org/10.1016/j.chaos.2020.110212 Google Scholar

[2] 

Song, S., Lan, C., Xing, J., Zeng, W. and Liu, J., “An end-to-end spatio-temporal attention model for human action recognition from skeleton data,” Association for the Advance of Artificial Intelligence, 1 –7 (2017). Google Scholar

[3] 

Hu, X., Liu, T., Hao, X. and Lin, C., “Attention-based Conv-LSTM and Bi-LSTM networks for large-scale traffic speed prediction,” The Journal of Supercomputing, 10 1 –24 (2022). Google Scholar

[4] 

Tian, Y., Zhang, K., Li, J., Lin, X. and Yang, B., “LSTM-based traffic flow prediction with missing data,” Neurocomputing, 318 297 –305 (2018). https://doi.org/10.1016/j.neucom.2018.08.067 Google Scholar

[5] 

Zhang, X. and Zhang, Q., “Short-term traffic flow prediction based on LSTM-XGBoost combination model,” Computer Modeling in Engineering & Sciences, 125 95 –109 (2020). https://doi.org/10.32604/cmes.2020.011013 Google Scholar

[6] 

Dogan, E., “LSTM training set analysis and clustering model development for short-term traffic flow prediction,” Neural Computing and Applications, 33 11175 –11188 (2021). https://doi.org/10.1007/s00521-020-05564-5 Google Scholar

[7] 

Ma, Y., Zhang, Z. and Ihler, A., “Multi-lane short-term traffic forecasting with convolutional LSTM network,” IEEE Access, 8 34629 –34643 (2020). https://doi.org/10.1109/Access.6287639 Google Scholar

[8] 

Mou, L., Zhao, P., Xie, H. and Chen, Y., “T-LSTM: A long short-term memory neural network enhanced by temporal information for traffic flow prediction,” IEEE Access, 7 98053 –98060 (2019). https://doi.org/10.1109/Access.6287639 Google Scholar

[9] 

Luo, X., Li, D., Yang, Y. and Zhang, S., “Spatiotemporal traffic flow prediction with KNN and LSTM,” Journal of Advanced Transportation, 1 537 –546 (2019). Google Scholar

[10] 

Liu, J., Zhang, T., Han, G. and Gou, Y., “TD-LSTM: Temporal dependence-based LSTM networks for marine temperature prediction,” Sensors, 18 3797 (2018). https://doi.org/10.3390/s18113797 Google Scholar

[11] 

Wang, R., Liang, X., Zhu, X. and Xie, Y., “A feasibility of respiration prediction based on deep Bi-LSTM for real-time tumor tracking,” IEEE Access, 6 51262 –51268 (2018). https://doi.org/10.1109/ACCESS.2018.2869780 Google Scholar

[12] 

Chen, Y. and Xu, J., “Rainfall-runoff short-term forecasting method based on LSTM,” in Journal of Physics: Conference Series, 012005 (2021). Google Scholar

[13] 

Zheng, L, Ye, X. and Chen, F., “Research on main steam temperature prediction model based on improved LSTM algorithm,” in Journal of Physics: Conference Series, 012055 (2020). Google Scholar

[14] 

Chen, Y., “Voltages prediction algorithm based on LSTM recurrent neural network,” Optik-International Journal for Light and Electron Optics, 220 164869 (2020). https://doi.org/10.1016/j.ijleo.2020.164869 Google Scholar

[15] 

Lu, J., Hong, L., Dong, Y. and Zhang, Y., “A new wavelet threshold function and denoising application,” Mathematical Problems in Engineering, 5 1 –8 (2016). Google Scholar

[16] 

Zheng, S., Lu, J., Zhao, H. and Zhu, X., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in IEEE Conf. on Computer Vision and Pattern Recognition, 6881 –6890 (2021). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Tianhuai Wang, Yipeng Li, Wenbing Chang, and Shenghan Zhou "A bagging ensemble learning traffic demand prediction model based on improved LSTM and transformer", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125061Y (28 December 2022); https://doi.org/10.1117/12.2662894
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Wavelets

Transformers

Feature extraction

Data analysis

Systems modeling

Electronic filtering

Back to Top