A new moving target detection algorithm is proposed to solve the problems of poor target detection integrity and interference by outliers in current moving target detection methods based on dynamic background. The algorithm combines the RAFT optical flow method with the YOLOV4 detection method. First, the offset between the matching pixels is obtained through the RAFT optical flow method. The epipolar constraint relationship between the matching pixels is solved based on the offset. The constraint relationship is used to solve the candidate frame of the foreground target edge, Then the method of calculating the coincidence degree between the candidate frame of the foreground target edge and the candidate frame of the moving target detected by the YOLOV4 network is used to preliminarily extract the moving target area, which improves the accuracy of the moving target detection. Finally, the final moving target region is obtained after post-processing the initially extracted internal area of the moving target. The experimental results show that compared with the algorithm without the YOLOV4 network, the precision accuracy and F-measure of the algorithm on kitti's public dataset are improved, and more accurate segmentation results can be obtained.
Predicting depth information from a single image has recently become an important research topic in computer vision. In particular, the self-supervised strategy for learning the depth is more attractive because it is not necessary to label any ground truth information. Under the framework of self-supervised learning we propose a CA-depth network to improve the accuracy of a single image depth estimation. We added the attention mechanism to the monocular depth estimation network to address the issues of observable artifacts and inaccurate prediction geometry in monocular depth estimation images. The spatial position information in the high-dimensional feature map is used to pay attention to the essential features, and to weaken the artifact phenomenon in the depth prediction map. We used Resnet as the encoder to extract the input image's feature map, the coordinate attention mechanism to realize the optimal allocation of convolution feature map weight, and the decoding network structure to predict the depth. Experimental results on public datasets show that the depth prediction accuracy of the CA-depth network is higher than the state-of-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.