Paper
1 June 2023 Risk prediction model of bank telecommunication fraud based on XGBoost
Siyuan Wu, Derong Yang, Wenjun Ge, Baoqin Chen
Author Affiliations +
Proceedings Volume 12718, International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023); 127182H (2023) https://doi.org/10.1117/12.2681646
Event: International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023), 2023, Nanjing, China
Abstract
The digital economy is booming, but cybercrimes and telecommunication frauds are emerging one after another. How to detect fraudulent behaviours and prevent the occurrence of crimes is a significant challenge. This paper mainly conducts data mining and analysis on the bank card telecommunication fraud data set, first of all, data mining and feature engineering for the given data set, including analyzing the data integrity, the overall statistical analysis of the data and standardizing the data using the Z-Score standardization method, Use the Pearson correlation coefficient to explore the feature correlation, use the SMOTE method to balance the data set, and finally divide the training set and the test set. Subsequently, four machine learning classification models, including the logistic regression classification model, KNN classification model, decision tree classification model and XGBoost classification model, were established to predict and classify fraudulent behaviours preliminarily. To further mine the data set of bank card telecommunication fraud, the optimal solutions of the models are obtained by grid tuning and cross-validation for the four established models. After experiments, the logistic regression classification model, KNN classification model, decision tree classification model and XGBoost classification The prediction accuracy rates of the model in the test set are 93.45%, 99.85%, 99.92%, and 99.94%, respectively. It is preliminarily believed that the XGBoost and decision tree classification models have excellent classification capabilities. Use the obtained four optimal models to calculate the three performance evaluation indicators of prediction accuracy, recall rate and F1 value in the test set, respectively, and further evaluate the four machine learning models. Through comparative analysis, the XGBoost classification model has the best performance. Due to its classification ability, strong generalization ability and robustness, it is selected as the final bank card telecommunication fraud prediction model. In addition, the P-R curve and ROC curve of the classification results are drawn using the performance evaluation indicators to be intuitive. Analysis of the model's performance further shows that XGBoost has better generalization ability.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Siyuan Wu, Derong Yang, Wenjun Ge, and Baoqin Chen "Risk prediction model of bank telecommunication fraud based on XGBoost", Proc. SPIE 12718, International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023), 127182H (1 June 2023); https://doi.org/10.1117/12.2681646
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Visual process modeling

Telecommunications

Analytic models

Performance modeling

Decision trees

Statistical modeling

Back to Top