10 March 2022 Pyramid frequency network with spatial attention residual refinement module for monocular depth estimation
Zhengyang Lu, Ying Chen
Author Affiliations +
Abstract

Deep-learning-based approaches to depth estimation are rapidly advancing, offering superior performance over existing methods. To estimate the depth in real-world scenarios, depth estimation models require the robustness of various noise environments. We propose a pyramid frequency network (PFN) with spatial attention residual refinement module (SARRM) to deal with the weak robustness of existing deep-learning methods. To reconstruct depth maps with accurate details, the SARRM constructs a residual fusion method with an attention mechanism to refine the blur depth. The frequency division strategy is designed, and the frequency pyramid network is developed to extract features from multiple frequency bands. With the frequency strategy, PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets. Additional experiments on the noisy NYUv2 dataset demonstrate that PFN is more reliable than existing deep-learning methods in high-noise scenes.

© 2022 SPIE and IS&T 1017-9909/2022/$28.00© 2022 SPIE and IS&T
Zhengyang Lu and Ying Chen "Pyramid frequency network with spatial attention residual refinement module for monocular depth estimation," Journal of Electronic Imaging 31(2), 023005 (10 March 2022). https://doi.org/10.1117/1.JEI.31.2.023005
Received: 1 December 2021; Accepted: 22 February 2022; Published: 10 March 2022
Lens.org Logo
CITATIONS
Cited by 8 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

RGB color model

Model-based design

Feature extraction

Visualization

Network architectures

Performance modeling

Back to Top