Recently, anchor-free methods have brought new ideas to the field of object detection that eliminate the need for anchor boxes in object detection and provide a simpler detection structure. CenterNet is the representative anchor-free method. However, this method still has the problem of obtaining high-resolution representation from low-resolution representation using upsampling, and the predicted heatmap is not accurate enough in space and does not make full use of the shallow low-level features of the network. We introduce CenterNet-HRA to solve this problem. An attention module is proposed to calibrate the high-level semantic features of the network output using the shallow low-level features from different receptive fields; HRNet is used as the backbone to maintain high-resolution feature representation through the whole process rather than using upsampling to generate high-resolution feature representation as HourglassNet. Considering that the feature representations with different resolutions have different contributions to the network but HRNet fuses them without distinction, a novel weighted feature fusion HRNet is designed to achieve higher detection precision. Our method achieves an average precision (AP) of 42.3% at 13.5 frames-per-second (FPS) (40.3% AP at 13.3 FPS for CenterNet-HG) on the MS-COCO benchmark. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 7 scholarly publications.
Sensors
Convolution
Calibration
Deconvolution
Data modeling
Detection and tracking algorithms
Feature extraction