Due to the complex background, small proportion of target pixels, and large scale difference in remote sensing images, this paper designs a Attention Fusion and Context Enhancement Network (AFCE-Net) to improve the detection accuracy of YOLOv5 for large and small targets in remote sensing images. Firstly, the C3 module is replaced with Transformer Encoder module in large target detection to capture the target and surrounding environment information by improving global attention, adds SimAM module to reassign channel and spatial weights and Context Enhancement module to suppress background noise and expand the perceptual field. Secondly, Another Context Enhancement module is designed to obtain feature information focusing on small targets in the feature fusion stage of small target detection and then the attention fusion module filters the network channel weights. Finally, the localization loss is replaced by SIoU to improve the target localization accuracy. In this paper, the experiments results on DOTA and DIOR datasets are 5.81% and 3.03% higher than the benchmark algorithm, respectively, and the FPS reaches 135.14 frame/s, which provides a reference for the research of remote sensing image object detection.
|