Open Access Paper
11 September 2023 Bidirectional sampling method for imbalanced data
Author Affiliations +
Proceedings Volume 12779, Seventh International Conference on Mechatronics and Intelligent Robotics (ICMIR 2023); 127792H (2023) https://doi.org/10.1117/12.2688900
Event: Seventh International Conference on Mechatronics and Intelligent Robotics (ICMIR 2023), 2023, Kunming, China
Abstract
Traditional over-sampling and under-sampling algorithms suffer from overfitting and high noise when unbalanced data classes are in the sample set. To improve the performance of the data classifier, this study proposes a SMOTECU algorithm combining SMOTE and ClusterCentroids under-sampling. It absorbs the advantages of both algorithms and avoids generating or rejecting excessive samples in the dataset, effectively reducing the harmful effects of overfitting and noise. We experiment with 16 unbalanced standard datasets combining three classifiers: RF, RBFNN, and RBFSVM. By comparing three evaluation metrics: F1-score, AUC, and running time, the results demonstrate that the performance of the SMOTECU-based random forest model is better, and compared with SMOTE and ClusterCentroids, SMOTECU can effectively avoid overfitting and save running time.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Junjie Shi, Deyu Song, Shengyao Zheng, Yueming Hu, Shuangshuang Chen, and Fengque Pei "Bidirectional sampling method for imbalanced data", Proc. SPIE 12779, Seventh International Conference on Mechatronics and Intelligent Robotics (ICMIR 2023), 127792H (11 September 2023); https://doi.org/10.1117/12.2688900
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Data modeling

Random forests

Overfitting

Evolutionary algorithms

Neural networks

Performance modeling

Back to Top