In practical applications, to improve the real-time performance of end-to-end stereo matching networks, the existing methods build cost volume at low resolution. However, with detailed information missing in low-resolution features, it is difficult to get accurate disparity estimation results in weak texture regions. Besides, smooth L1 loss supervision also results in a loss of accuracy in disparity discontinuity areas. To solve these problems, we propose an efficient stereo-matching network based on multiple attention mechanisms and edge optimization, which can achieve high accuracy in a short time. The multi-scale attention module is applied to enhance the feature expression in detail regions. For weak texture areas, we construct a concatenation cost volume and a multi-level patch matching volume, which can be combined to improve the network’s attention to weak texture regions. In terms of edge optimization, we perform bimodal Laplace modeling of the sampled edge points’ disparity distribution and optimize the edge region of the initial disparity map using likelihood loss to obtain sharp edges. The experimental results show that, on the SceneFlow and KITTI datasets, the proposed network improves by 32% and 27% in accuracy compared with BGNet+.
KEYWORDS: Transparency, Cameras, Sensors, 3D modeling, Reconstruction algorithms, Optical engineering, 3D acquisition, 3D image processing, 3D image reconstruction, Image segmentation
Consumer-grade range cameras are widely used in three-dimensional reconstruction. However, the resolution and stability limit the quality of reconstruction, especially with transparent objects. A method is proposed to reconstruct the transparency while improving the reconstruction quality of the indoor scene with a single RGB-D sensor. We propose the method to localize the transparent regions from zero depth and wrong depth. The lost surface of transparency is recovered by modeling the statistics of zero depth, variance, and residual error of signed distance function (SDF) with depth data fusion. The camera pose is first initialized by the error minimization of depth map on the SDF and k-color-frame constraint. The pose then is optimized by the penalized coefficient function, which lowers the weight of voxels with higher SDF error. The method is proved to be valid in localizing the transparent objects and can achieve a more robust camera pose under a complex background.
Transparency reconstruction has been a challenging problem in active 3D reconstruction, due to the abnormal transparency appearance of invalid depth and wrong depth captured by structured light sensor. This paper proposes a novel method to localize and reconstruct transparency in domestic environment with real-time camera tracking. Based on the Sighed Distance Function(SDF), we estimate the camera pose by minimizing residual error of multiple depth images in the voxel grid. We adopt asymmetric voting of invalid depth to curve the transparency in 3D domain. Concerning the wrong depth caused by transparency, we build a local model to investigate the depth oscillation of each voxel between frames. With the fusion of depth data, we can get the point cloud of transparency and achieve a higher-quality reconstruction of an indoor scene simultaneously. We explore a series of experiments using a hand-held sensor. The results validate that our approach can accurately localize the transparent objects and improve their 3D model, and is more robust against the interference of camera dithering and other noise.
In stereo vision ranging system, the parallel binocular pinhole camera model is commonly, actually, the regular binocular stereo camera is not strictly parallel. In order to improve the accuracy of binocular stereo vision ranging system, a new non-parallel binocular model ranging system using combinations of linear and nonlinear methods is proposed. This ranging system sets up a linear stereo pinhole camera model, and determines the intrinsic and extrinsic parameters of two cameras by the classic calibration method. After rectify the images, the system adopts the nonlinear scale space theory to extract the feature of the image, it using improved KAZE algorithm for feature extracting and stereo matching. Compared with the classic SIFT and SURF algorithm, the experimental measurements show that this ranging system has high feasibility and precision. The ranging accuracy within a certain scope meets the application requirements, and according to the actual situation, this system can be adjusted to suitable for vehicle ranging and robot distance measurement.
Binary descriptors have been widely used in many real-time applications due to their efficiency. These descriptors are commonly designed for perspective images but perform poorly on omnidirectional images, which are severely distorted. To address this issue, this paper proposes tangent plane BRIEF (TPBRIEF) and adapted log polar grid-based motion statistics (ALPGMS). TPBRIEF projects keypoints to a unit sphere and applies the fixed test set in BRIEF descriptor on the tangent plane of the unit sphere. The fixed test set is then backprojected onto the original distorted images to construct the distortion invariant descriptor. TPBRIEF directly enables keypoint detecting and feature describing on original distorted images, whereas other approaches correct the distortion through image resampling, which introduces artifacts and adds time cost. With ALPGMS, omnidirectional images are divided into circular arches named adapted log polar grids. Whether a match is true or false is then determined by simply thresholding the match numbers in a grid pair where the two matched points located. Experiments show that TPBRIEF greatly improves the feature matching accuracy and ALPGMS robustly removes wrong matches. Our proposed method outperforms the state-of-the-art methods.
Abnormal event detection in crowded scenes is a challenging problem due to the high density of the crowds and the occlusions between individuals. We propose a method using two sparse dictionaries with saliency to detect abnormal events in crowded scenes. By combining a multiscale histogram of optical flow (MHOF) and a multiscale histogram of oriented gradient (MHOG) into a multiscale histogram of optical flow and gradient, we are able to represent the feature of a spatial–temporal cuboid without separating the individuals in the crowd. While MHOF captures the temporal information, MHOG encodes both spatial and temporal information. The combination of these two features is able to represent the cuboid’s appearance and motion characteristics even when the density of the crowds becomes high. An abnormal dictionary is added to the traditional sparse model with only a normal dictionary included. In addition, the saliency of the testing sample is combined with two sparse reconstruction costs on the normal and abnormal dictionary to measure the normalness of the testing sample. The experiment results show the effectiveness of our method.
Per-pixel hand detection plays an important role in many human–computer interaction applications while accurate and robust hand detection remains a challenging task due to the large appearance variance of hands in images. We introduce a per-pixel hand detection system using one single depth image. We propose a circle sampling depth-context feature for hand regions representation, and a multilayered hand detection model is built for hand regions detection. Finally, a postprocessing step based on spatial constraints is applied to refine the detection results and further improve the detection accuracy. We evaluate the accuracy of our method on a public dataset and investigate the effect of key parameters in our system. The results of the qualitative and quantitative evaluation reveal that the proposed method performs well on per-pixel hand detection tasks. Furthermore, an additional experiment on hand parts segmentation proves that the depth-context feature has a generalization power for more complex multiclass classification tasks.
The objective of large-scale object retrieval systems is to search for images that contain the target object in an image database. Where state-of-the-art approaches rely on global image representations to conduct searches, we consider many boxes per image as candidates to search locally in a picture. In this paper, a feature quantization algorithm called binary quantization is proposed. In binary quantization, a scale-invariant feature transform (SIFT) feature is quantized into a descriptive and discriminative bit-vector, which allows itself to adapt to the classic inverted file structure for box indexing. The inverted file, which stores the bit-vector and box ID where the SIFT feature is located inside, is compact and can be loaded into the main memory for efficient box indexing. We evaluate our approach on available object retrieval datasets. Experimental results demonstrate that the proposed approach is fast and achieves excellent search quality. Therefore, the proposed approach is an improvement over state-of-the-art approaches for object retrieval.
Abnormal events detection in crowded scenes has been a challenge due to volatility of the definitions for both normality and abnormality, the small number of pixels on the target, appearance ambiguity resulting from the dense packing, and severe inter-object occlusions. A novel framework was proposed for the detection of unusual events in crowded scenes using trajectories produced by moving pedestrians based on an intuition that the motion patterns of usual behaviors are similar to these of group activity, whereas unusual behaviors are not. First, spectral clustering is used to group trajectories with similar spatial patterns. Different trajectory clusters represent different activities. Then, unusual trajectories can be detected using these patterns. Furthermore, behavior of a mobile pedestrian can be defined by comparing its direction with these patterns, such as moving in the opposite direction of the group or traversing the group. Experimental results indicated that the proposed algorithm could be used to reliably locate the abnormal events in crowded scenes.
The light field camera array can be regarded as distributed source. The image sequence captured by a camera array contains the inter-correlation and intra-correlation. In order to utilize the correlation, a joint sparsity model was established to combine the light field with distributed compressive sensing, and a recovery algorithm was proposed for the model -- simultaneous regularized orthogonal matching pursuit algorithm (SROMP), which uses the correlation to reconstruct the light field image sequences. Several light field images could be approximate at once using different linear combinations of elementary signals. Experimental results show that SROMP algorithm could be used to achieve high accuracy reported in the literatures.
Coded structured light can rapidly acquire the shape of unknown surfaces by projecting suitable patterns onto a measuring surface and grabbing distorted patterns with a camera. By analyzing the deformation patterns appearing in the images, depth information of the surface can be calculated. This paper presents a new concise and efficient mathematical model for coded structured light measurement system to obtain depth information. The interrelations among model parameters and errors of depth information are investigated. Based on the system geometric structure, the results of system parameters affecting object imaging can be obtained. Also, the dynamic deformation patterns can be captured under different measurement conditions. By analyzing the system parameters and depth information errors, the system constraint conditions can be determined, and the system model simulation and error analysis are discussed in experiments, too. Also, the system model based on optimal parameters is utilized to implement reconstruction for two objects.
Based on an analysis of maximum stripe deformation (due to depth change on surfaces) and measuring resolution limits, a principle of spatial periodicity used for coding is proposed. When spatial periodicity is used for coding, the resolution is greatly improved, or the number of patterns is greatly reduced for real-time structured light systems. A novel coded pattern for real-time structured light systems is presented, which is based on spatial periodicity. The coding pattern allows range scanning of moving objects with easy implementation of decoding and high measurement resolution. Using alternate time-space coding in a structured light system, we achieve a measurement speed of 20 frames per second with two stripe patterns.
In order to achieve a non-contact interactive operation in particular conditions such as high-temperature, high-voltage
conditions and space capsules, a real-time indicated object recognition method is proposed in this paper. It combines
eye-finger moving information to estimate the object position. Multi-camera is used to get images containing fingertips
and eyes, and binocular vision principle is utilized to estimate the 3D position of fingertips and eyes. According to
physiological characteristic, when people indicate objects, the line linking the center of his two eyes and fingertip will
pass the object point. So after capturing eyes and fingertips in video stream images with feature point extracting
algorithm, a model from 2D image coordination to object scene coordination which can be expressed as a projective
translation with multi-view restriction is presented. Using this model, 3D position of eyes and fingertips can be estimated
from 2D positions in images, and the line linking the center of a person's two eyes and his fingertip is obtained.
Intersecting this line and the plane which the object stand on it produces the object point which is the point indicated by
the person's finger. This method estimates the absolute position of the object, which means it needn't users to provide
any initial benchmark information. Finally, this method is tested by a practical indicated object recognition system with
error analysis of camera calibration and image processing result.
Surface defection inspection methods based on machine vision have lots of advantages over many other automatic
inspection methods, such as higher flexibility, lower overall cost, etc. However, the robustness of these methods is still
unsatisfactory. Inspection of magnetic rings which are rich in texture and have various defections is a typical machinevision-
based inspection task with high difficulty. Therefore, conclusions of the research on this problem are
representative.
In this paper, factors which lead to the variation of the inspection results are classified, and then a quantitative analysis
for inspection systems introducing a new concept of robustness index is proposed. As an approach for enhancing
robustness, the effect of the algorithm rule is focused on. The author extracts defection features on three levels in
designing the rule and come to a conclusion that a complete extraction on higher level can enhance the robustness of the
system after theory analysis and experiments.
KEYWORDS: Magnetic resonance imaging, Image restoration, Error analysis, Signal to noise ratio, Fourier transforms, Data acquisition, Reconstruction algorithms, Direct methods, Image quality, Brain imaging
A Non-Uniform Fast Fourier Transform (NUFFT) based method for non-Cartesian k-space data reconstruction is
presented. For Cartesian K-space data, as we all know, image can be reconstructed using 2DFFT directly. But, as far as
know, this method has not been universally accepted nowadays because of its inevitable disadvantages. On the contrary,
non-Cartesian method is of the advantage over it, so we focused on the method usually. The most straightforward
approach for the reconstruction of non-Cartesian data is directly via a Fourier summation. However, the computational
complexity of the direct method is usually much greater than an approach that uses the efficient FFT. But the FFT
requires that data be sampled on a uniform Cartesian grid in K-space, and a NUFFT based method is of much
importance. Finally, experimental results which are compared with existing method are given.
According to Sampling Theorem and image processing, structured light system has some limits, such as the
measurement resolution is restricted, some little gaps can not be measured and there are some errors or lost data on the
border of surface. A novel rotatable interlaced coding in real-time system of 3D information acquisition using structuredlight
imaging is proposed. It is consisting of two free directions of three-frame space-time light pattern, which can
acquire the denser 3D data from a single viewpoint. It can decrease the error from range image registration and advance
the system accuracy. The paper builds a real-time system of 3D profile measurement using structured-light. It allows a
hand-held object to rotate freely in the space-time coded light field, which is projected by the projector.
KEYWORDS: Distortion, Projection systems, Imaging systems, Data modeling, 3D acquisition, 3D metrology, Charge-coupled devices, 3D modeling, Calibration, Structured light
The three-dimensional shape measurement has been widely used in a lot of applications such as traffic, entertainment, architecture design, manufacturing and archeology. The paper simplifies the principle of structured-light triangulation with the constraints of light-plane and takes the radial lens distortion during CCD imaging into account, which is able to improve the system accuracy. In order to release the limit of spatial and temporal stereo in the structured light system and improve the process rate and accuracy, the system utilizes the space-time stripe boundary coded patterns. ICP (Iterative Closest Points) is widely used for geometric alignment of three-dimensional models when an initial estimate of the relative pose is given, or the relative motion is small. According to the features of data from structured light acquisition system, the paper utilizes the advanced matching algorithm, which is based on projection. This algorithm is easer and more accurate than conventional ICP.
KEYWORDS: Local area networks, Video, Video compression, Analog electronics, Computer programming, Control systems, Embedded systems, Video coding, Microcontrollers, Video processing
In this work an embedded system is designed which implements MPEG-2 LAN transmission of CVBS or S-video signal. The hardware consists of three parts. The first is digitization of analog inputs CVBS or S-video (Y/C) from TV or VTR sources. The second is MPEG-2 compression coding primarily performed by a MPEG-2 1chip audio/video encoder. Its output is MPEG-2 system PS/TS. The third part includes data stream packing, accessing LAN and system control based on an ARM microcontroller. It packs the encoded stream into Ethernet data frames and accesses LAN, and accepts Ethernet data packets bearing control information from the network and decodes corresponding commands to control digitization, coding, and other operations. In order to increase the network transmission rate to conform to the MEPG-2 data stream, an efficient TCP/IP network protocol stack is constructed directly from network hardware provided by the embedded system, instead of using an ordinary operating system for embedded systems. In the design of the network protocol stack to obtain a high LAN transmission rate on a low-end ARM, a special transmission channel is opened for the MPEG-2 stream. The designed system has been tested on an experimental LAN. The experiment shows a maximum LAN transmission rate up to 12.7 Mbps with good sound and image quality, and satisfactory system reliability.
Automatic 3D-reconstruction from an image sequence of an object is described. The construction is based on multiple views from a free-mobile camera and the object is placed on a novel calibration pattern consisting of two concentric circles connected by radial line segments. Compared to other methods of 3D-reconstruction, the approach reduces the restriction of the measurement environment and increases the flexibility of the user. In the first step, the images of each view are calibrated individually to obtain camera information. The calibration pattern is separated from the input image with the erosion-dilation algorithm and the calibration points can be extracted from the pattern image accurately after estimations of two ellipses and lines. Tsai’s two-stage technique is used in calibration process. In the second step, the 3D reconstruction of real object can be subdivided into two parts: the shape reconstruction and texture mapping. With the principle of “shape from silhouettes (SFS)”, a bounding cone is constructed from one image using the calibration information and silhouette. The intersection of all bounding cones defines an approximate geometric representation. The experimental results with real object are performed, the reconstruction error <1%, which validate this method’s high efficiency and feasibility.
Pellet's position is the key of ICF. The paper introduces an automatic orientation method using two cameras. It based on the 2-D image coordinates resolve the 3-D information. Two steps are considered. Firstly, 3-D orientation is estimated by auto-focus algorithm using pellet edge location to subpixel values in 2-D digital imagery. Secondly, 3-D position is estimated by centroid algorithm and auto-focus algorithm. By this way, it can ignore the forms of pellet edge. In the paper, analytical formulations of the problem are given. It also gives analyse between the centroid algorithm and edge fit algorithm in accuracy. Defocus factors have been compensated in order to obtain accurate estimates of the parameters by the imaged edges. The experiments have shown the accuracy of location pellet can reach 0.2 um, and orientation pellet can reach 3’.
A new method of vision coordinate measurement using stereo-probe imaging is presented. The system consists of a CCD camera, a stereo-probe and a personal computer. It realizes the three-dimensional coordinate measurement of the contacted points by utilizing the stereo-probe to contact with measurement surface and analyzing the imaging changes of the known characteristic points on the objective-probe. It can realize not only most of the present vision coordinate measurement, but also the measurement of hidden surfaces. In addition, it does not depend on light characteristic of measurement surface and has a larger measurement range. At first, the paper sets up the nonlinear measurement equations of the vision coordinate measurement system by special coordinate translation. Following, the paper describes the Newton nonlinear initiation method to resolve the measurement equations. Finally, the paper analyzes the effect factors of the system accuracy. In addition, the paper also brings forward an auto-calibration approach of important parameter---effective focus. Furthermore, the article validates the correctness of measuring model and feasibility of resolving the nonlinear measurement equations.
KEYWORDS: Charge-coupled devices, 3D metrology, Collimation, Imaging systems, Autocollimation, Systems modeling, Reflectors, Mathematical modeling, 3D image processing, 3D modeling
The paper presents a new method of measuring 3D small angles. In the method, a collimated parallel ray is projected on a double-face-reflector, a CCD detects the position changes of the returned light points. Compared with the traditional autocollimation system, the presented system can distinguish the small change of angle around axis z, therefore, the system realizes the measurement of 3D small angles. By matrix transform, the paper sets up the mathematics model between the position changes of the imaging points on CCD and the changes of spatial small angles. It describes the characteristics of this method and gives its resolving method. By simulations, the paper testifies the correctness of the system model and its resolving method. Furthermore, the paper analyses the effect of the system parameters on the system resolutions. The results can be used as the reference and basis while designing system. When the system parameters apply the values supposed in the paper, the resolutions of alpha and beta can reach 0.025'. The resolution of gamma, less than that of alpha and beta, is about 2'.
The paper presents a new method of measuring 3D small angles. In the method, a collimated parallel ray is projected on a double-face-reflector, a CCD detects the position changes of the returned light points and by them, the paper sets up the measurement model of 3D small angles. It describes the characteristics of this method and gives its resolving method. The simulation tests verify their correctness. Finally, the paper does some analyses of the effect of some system parameters on the measurement..
The study describes a new vision coordinate measuring system - probe imaging vision coordinate measuring system using single camera. It proposed a new idea in vision coordinate measurement with a known objective-probe to contact with the surface to be measured, deduces a linear model for distinguishing six freedoms of the objective-probe and gaining the coordinates of the contacted point on the surface to be measured. The study analyzes some factors which affect the resolutions of the system. Simulations have shown that the system model is valid.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.