Risk prediction model of bank telecommunication fraud based on XGBoost

Siyuan Wu; Derong Yang; Wenjun Ge; Baoqin Chen

doi:10.1117/12.2681646

1 June 2023 Risk prediction model of bank telecommunication fraud based on XGBoost

Siyuan Wu, Derong Yang, Wenjun Ge, Baoqin Chen

Proceedings Volume 12718, International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023); 127182H (2023) https://doi.org/10.1117/12.2681646
Event: International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023), 2023, Nanjing, China

Abstract

The digital economy is booming, but cybercrimes and telecommunication frauds are emerging one after another. How to detect fraudulent behaviours and prevent the occurrence of crimes is a significant challenge. This paper mainly conducts data mining and analysis on the bank card telecommunication fraud data set, first of all, data mining and feature engineering for the given data set, including analyzing the data integrity, the overall statistical analysis of the data and standardizing the data using the Z-Score standardization method, Use the Pearson correlation coefficient to explore the feature correlation, use the SMOTE method to balance the data set, and finally divide the training set and the test set. Subsequently, four machine learning classification models, including the logistic regression classification model, KNN classification model, decision tree classification model and XGBoost classification model, were established to predict and classify fraudulent behaviours preliminarily. To further mine the data set of bank card telecommunication fraud, the optimal solutions of the models are obtained by grid tuning and cross-validation for the four established models. After experiments, the logistic regression classification model, KNN classification model, decision tree classification model and XGBoost classification The prediction accuracy rates of the model in the test set are 93.45%, 99.85%, 99.92%, and 99.94%, respectively. It is preliminarily believed that the XGBoost and decision tree classification models have excellent classification capabilities. Use the obtained four optimal models to calculate the three performance evaluation indicators of prediction accuracy, recall rate and F1 value in the test set, respectively, and further evaluate the four machine learning models. Through comparative analysis, the XGBoost classification model has the best performance. Due to its classification ability, strong generalization ability and robustness, it is selected as the final bank card telecommunication fraud prediction model. In addition, the P-R curve and ROC curve of the classification results are drawn using the performance evaluation indicators to be intuitive. Analysis of the model's performance further shows that XGBoost has better generalization ability.

Citation Download Citation

Siyuan Wu, Derong Yang, Wenjun Ge, and Baoqin Chen "Risk prediction model of bank telecommunication fraud based on XGBoost", Proc. SPIE 12718, International Conference on Cyber Security, Artificial Intelligence, and Digital Economy (CSAIDE 2023), 127182H (1 June 2023); https://doi.org/10.1117/12.2681646

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
13 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Data modeling

Visual process modeling

Telecommunications

Analytic models

Performance modeling

Decision trees

Statistical modeling

Show All Keywords

Keywords/Phrases

Search In:

Publication Years