Paper
9 October 2024 MVSBF: 3D object detection algorithm based on multiscale voxel sampling and bird’s eye view fusion
Maowei Yang, Xin Shi, Xinqian Hu, Hailong Yu
Author Affiliations +
Proceedings Volume 13288, Fourth International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2024); 132880H (2024) https://doi.org/10.1117/12.3045448
Event: Fourth International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2024), 2024, Chengdu, China
Abstract
The multi-modality fusion scheme based on LiDAR and camera has become a mainstream method for 3D object detection task in the Bird's Eye View (BEV) space. To resolve the issue of poor accuracy and loss of height information in the current process of LiDAR point clouds, a 3D object detection algorithm based on Multi-scale Voxel Sampling and Bird’s-eye view Fusion (MVSBF) is proposed. First, the raw LiDAR point clouds are voxelized, and the voxels in different heights are randomly sampled by the multi-scale sampling. Second, the LiDAR-BEV features are generated by incorporating a random voxel sampling layer into the Sparsely Embedded Convolutional Detection (SECOND) network. Third, the extracted camera images features are processed based on the depth estimation to generate the corresponding camera-BEV features. Finally, the two-branch BEV features are subjected to feature fusion by utilizing a module designed to integrate BEV features from two frames. The experiments show that MVSBF can achieve the mean of Average Precision (mAP) of 70.1% and the NuScenes Detection Scores (NDS) of 73.5% on the NuScenes test set, and can outperform the baseline models by at least 0.9% mAP and 1.7% NDS, respectively.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Maowei Yang, Xin Shi, Xinqian Hu, and Hailong Yu "MVSBF: 3D object detection algorithm based on multiscale voxel sampling and bird’s eye view fusion", Proc. SPIE 13288, Fourth International Conference on Computer Graphics, Image, and Virtualization (ICCGIV 2024), 132880H (9 October 2024); https://doi.org/10.1117/12.3045448
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Voxels

LIDAR

Point clouds

Object detection

Cameras

Feature extraction

Feature fusion

Back to Top