Paper
10 October 2023 HCT-Det: a hybrid CNN-transformer architecture for 3D object detection from point clouds
Chao Wang
Author Affiliations +
Proceedings Volume 12799, Third International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023); 127993I (2023) https://doi.org/10.1117/12.3005832
Event: 3rd International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023), 2023, Kuala Lumpur, Malaysia
Abstract
Detecting 3D objects from LiDAR points is significant for the environmental perception of robotic systems. Some pillarbased 3D object detectors solely use 2D convolutions as feature encoders, which occupy fewer computation resources but sacrifice model accuracy. To activate the potential performance of pillar-based feature representation manners, we propose HCT-Det, a novel hybrid CNN-Transformer architecture for 3D object detection from point clouds. Motivated by the structure re-parameterization technique and vision transformer (ViT) framework, we redesign the 2D backbone and further introduce the Rep-VGG block and multi-head self-attention (MHSA) mechanism to enrich the scale diversity of feature representation. We perform ablation experiments on the KITTI vision benchmarks to highlight the superiority of our HCT-Det. The evaluation results show that our model outperforms PointPillars baseline, yielding an accuracy of 79.08 moderate AP3D on the car category at a speed of 57.46 FPS on the NVIDIA Tesla P40 platform. Without bells and whistles, our HCT-Det can achieve a reasonable trade-off between accuracy and speed.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Chao Wang "HCT-Det: a hybrid CNN-transformer architecture for 3D object detection from point clouds", Proc. SPIE 12799, Third International Conference on Advanced Algorithms and Signal Image Processing (AASIP 2023), 127993I (10 October 2023); https://doi.org/10.1117/12.3005832
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Point clouds

Convolution

3D modeling

LIDAR

Feature extraction

Voxels

Back to Top