Multi-View Stereo (MVS) is a technology that reconstructs the three-dimensional structure of objects or scenes through 2D images and camera parameters. PatchMatch based MVS methods are widely used nowadays. However, these algorithms have two weaknesses: cross-correlation based similarity measurement methods becoming ineffective on regions that are texture-less (e.g. white wall) or have stochastic textures (e.g. grass), as the similarity measurement method such as NCC could not perform well on these regions; Sometimes, the reconstructed result may be stuck into global optimal which heavily drifts from the ground truth because of noise, occlusion and etc. To tackle these issues, we present an MVS pipeline called Adaptive Pixelwise Inference Multi-View Stereo (API-MVS). First, a strategy is proposed to adaptively infer the matching window sizes of each pixel, making the reconstructed results on texture-less or stochastic texture regions have a better trade-off between accuracy and completeness. Second, a new cost function is proposed to integrate the matching cost values computed using different neighboring images, and experiments confirmed that the cost function we used can make locally optimized results closer to the ground truth. We have tested our algorithm on ETH3D benchmarks. The result shows the effectiveness of our method, and it is comparable to state-of-the-art methods.
|