KEYWORDS: Pose estimation, Feature extraction, Semantics, Education and training, 3D modeling, 3D image processing, RGB color model, Cameras, Ablation, Sensors
Estimating the 6-degree-of-freedom (6Dof) pose for objects is a fundamental task in vision-based measurement. It offers targets' 3D position and orientation information with respect to the camera, which is valuable in various applications, such as robotics, autonomous driving, and augmented reality. Among different approaches, monocular vision methods have the advantage of being flexible and economical. It extracts features from a single RGB image and matches them with the corresponding parts of the target's known 3D model. Recently, regression methods directly predict objects' 6Dof pose have dominated this field by leveraging Convolution Neural Networks (CNN) and learning from tremendous data to extract semantic features. The previous method that leverages objects' surface normal vectors to disentangle rotation estimation from translation achieves superior performance. However, it adopts a backbone network to extract orientation and position features from the input image simultaneously. Therefore, the backbone network restricts the method's overall performance. In this paper, we illustrate this problem and adopt an advanced backbone network as well as a Feature Pyramid Network (FPN) to enhance the feature-extracting capability of our method. We conduct various experiments and ablation studies to demonstrate the outperformance and effectiveness of our newly proposed network, namely Efficient-NVR. Notably, it surpasses state-of-the-art methods on the Linemod benchmark by obtaining 1.3% more accuracy than the baseline.
This study introduces a novel method for analyzing how the choice of a reference camera influences the super-resolution reconstruction in camera array imaging systems. Through a combination of simulations and experimental validations, it becomes evident that traditional, uniformly arranged camera systems often fall short of achieving optimal super-resolution effects, irrespective of the reference camera chosen. The introduction of a central camera into this conventional arrangement, either by adding it or substituting an existing camera, not only yields improved super-resolution outcomes but also significantly enhances the system's robustness to variations in object distance. This advancement notably elevates the functional utility of sparse camera array systems in practical applications.
Feature-based image matching is often contaminated by some mismatches due to the limited representation of descriptors. Existing two-stage mismatch removal filters usually select some seed points first and then remove outliers according to the consistency in neighborhoods. However, the filter's performance is directly influenced by the selection of effective seed points. In this paper, we design an elegant Spectral-spatial Outlier Filter (SOF) to harvest high-accuracy image matching. Specifically, we first calculate eigenvectors of Laplacian matrix from the joint image graph as feature descriptors in the spectral domain to select more reasonable seed points, and then these points are fed into the local a fine verification in the spatial domain in the second stage to effectively remove outliers. Experimental results on challenging datasets demonstrate that the proposed filter further improves the precision of image matching, and steadily outperforms other state-of-the-art methods.
In the process of imaging, atmospheric turbulence will lead to image degradation, such as noise, blur, geometric distortion, thus reducing the quality of feature point extraction. In order to solve this problem, we analyze images with atmospheric turbulence degradation and find that image blur and geometric distortion have great influence on feature extraction. Image blur is a representation of high-frequency information loss, so detectors based on gray gradient will extract fewer points. On the other hand, geometric distortion is reflected by the movement of pixels in the image patch, which will also cause the movement of feature points, especially when they are extracted according to their neighborhoods. In this paper, we propose Wiener Filter and Linear Minimum Variance Unbiased Estimation (WFLMVUE) strategy to deal with image blur and geometric distortion respectively. A simplified filter based on Wiener’s method is used to remove noise and ambiguity. Then the base frame and auxiliary frames are used to estimate the position of feature points by linear minimum variance unbiased estimation. Experimental results show that WF-LMUVE has great advantages in increasing the number of feature points and improving their location accuracy.
Deep sort algorithm is a multi-object tracking algorithm with high tracking accuracy and speed. However, due to the lack of detection filter and the association stage of a single frame, the accuracy of multi-object tracking is remaining enhancement. In this paper, we propose a DO-Adaptive NMS algorithm to filter the detections, and combine the K nearest neighbor algorithm with the intersection of union algorithm to sharpen features of the trajectories. Besides, we put forward a weighted algorithm of motion information and appearance information, which takes the disappear time of trajectories into consideration. Experiments show that the methods mentioned above all perform better than the original algorithm.
Aircraft images captured by a third-party camera during take-off and landing can be used for monitoring and aircraft pose measurement. Hazy weather would severely affect the aircraft image quality and incur the worse visual perception. Haze removal from the aircraft image has become an important task for practical industrial applications. Existing deep learning algorithms need the hazy image and corresponding hazy-free ground-truth image simultaneously for the same scene and time, to learn the dehazing process. However, the ground-truth aircraft images are difficult to obtain, which hinders those approaches from addressing the actual aircraft image dehazing problem. In this paper, we present an endto- end ground-truth information agnostic deep dehazing network for single C919 aircraft image dehazing problem. Instead of the requirement of ground-truth image, we train the network only by utilizing the pair of hazy and predehazed images. The pre-dehazed image can be easily obtained by the conventional dehazing manner without deep learning, and the Natural Image Quality Evaluator (NIQE) is introduced to find the best dehazing model. Compared to existing dehazing algorithms, the proposed algorithm can be capable of addressing real-world hazy C919 aircraft images effectively and achieve the best dehazed performance on our collected aircraft dataset.
KEYWORDS: Aircraft structures, 3D modeling, Stereo vision systems, 3D image processing, Cameras, 3D metrology, 3D acquisition, Imaging systems, 3D image reconstruction
Aircraft pose is of great importance to the monitor during flight test. Commonly, traditional pose measurement is based on various sensors, which has the inconvenience of repair and replacement because of damage. Two-dimension (2D) single image processing method alleviate the inconvenience, but it has the ambiguity of single image three-dimension (3D) reconstruction. To address these problems, we accomplish 3D reconstruction of the aircraft’s structures via 2D multi-view images. Structures are obtained from 2D multi-view images of aircraft by a convolutional neural network (CNN) and then used to accomplish reconstruction. Structures typically represent the topological relationship between components of aircraft, reducing the self-occlusion of point features. To more precise evaluation of the experimental results, we propose a new Mean Per Frame Position Error (MPFPE) calculation for the structures position. Compared with the Mean Per Joint Position Error (MPJPE), the MPFPE takes the length of structures into account and mixes the multi-view images. Experiments show the mean error of our method is 1.47%, which shows great potential for aircraft pose estimation.
KEYWORDS: Photography, 3D imaging standards, 3D metrology, Dynamical systems, 3D acquisition, Standards development, Visualization, Commercial off the shelf technology, Image quality, Opto mechatronics
The aircraft flight attitude can be obtained by the dynamic visual measurement system for aircraft (hereinafter short for MSA). It is crucial for MSA to evaluate its flight attitude measurement accuracy. There are several indoor evaluation methods for the MSA’s attitude measurement accuracy which is not suitable outdoors. Therefore, we present a method for evaluating its flight attitude measurement accuracy at outdoor working site. A three-dimensional standard verification field can be established by reasonable distribution of mark targets on the surface of outdoor building group. We construct a verification system for flight attitude measurement accuracy at outdoor working site. The building group whose threedimension scale is similar to the aircraft’s three-dimension scale is selected to construct the standard verification field. Paste mark points on the surface of the building group and their coordinates in 3D space are measured by the threedimensional coordinate measuring station consisting of two electronic theodolites. Mark points with known coordinates construct the standard verification field. Still photographs of the standard verification field are taken by the MSA. the attitude solved from the still photographs is used as reference attitude. Manipulate the MSA to shoot and record dynamically to simulate the real working condition, and photographs are taken to solve the dynamic measurement attitude at the same time. Accuracy analysis and evaluation can be performed using the dynamic measurement attitude and the reference attitude to provide scientific basis for debugging, checking outdoor parameters and acceptance of equipment.
Object tracking is a core subject in computer vision and has significant meaning in both theory and practice. We propose a tracking method in which a robust discriminative classifier is built based on both object and context information. In this method, we consider multiple frames of local invariant features on and around the object and construct the object template and context template. To overcome the limitation of the invariant representations, we also design a nonparametric learning algorithm using transitive matching perspective transformation. This learning algorithm can keep adding object appearance and can avoid improper updating when occlusions appear. We also analyze the asymptotic stability of our method and prove its drift-free capability in long-term tracking. Extensive experiments using challenging publicly available video sequences that cover most of the critical conditions in tracking demonstrate the enhanced strength and robustness of our method.
This paper reports an efficient method for line matching, which utilizes local intensity gradient information and neighboring geometric attributes. Lines are detected in a multi-scale way to make the method robust to scale changes. A descriptor based on local appearance is built to generate candidate matching pairs. The key idea is to accumulate intensity gradient information into histograms based on their intensity orders to overcome the fragmentation problem of lines. Besides, local coordinate system is built for each line to achieve rotation invariance. For each line segment in candidate matching pairs, a histogram is built by aggregating geometric attributes of neighboring line segments. The final matching measure derives from the distance between normalized geometric attributes histograms. Experiments show that the proposed method is robust to large illumination changes and is rotation invariant.
Constructing robust binary local feature descriptors are receiving increasing interest due to their binary nature, which can enable fast processing while requiring significantly less memory than their floating-point competitors. To bridge the performance gap between the binary and floating-point descriptors without increasing the computational cost of computing and matching, optimal binary weights are learning to assign to binary descriptor for considering each bit might contribute differently to the distinctiveness and robustness. Technically, a large-scale regularized optimization method is applied to learn float weights for each bit of the binary descriptor. Furthermore, binary approximation for the float weights is performed by utilizing an efficient alternatively greedy strategy, which can significantly improve the discriminative power while preserve fast matching advantage. Extensive experimental results on two challenging datasets (Brown dataset and Oxford dataset) demonstrate the effectiveness and efficiency of the proposed method.
Object tracking is a core subject in computer vision and has significant meaning in both theory and practice. In this paper, we propose a novel tracking method, in which a robust discriminative classifier is built basing on both object and context information. In this method, we consider multiple frames of local invariant features on and around the object, and construct the object template and context template. To overcome the limitation of the invariant representations, we also design a non-parametric learning algorithm using transitive matching perspective transformation, which is called as LUPT (Learning Using Perspective Transformation). This learning algorithm can keep adding new object appearance into the object template and avoid improper updating when occlusions appear. In this paper, we also analyze the asymptotic stability of our method and prove its drift-free capability in long term tracking. Extensive experiments using challenging publicly available video sequences that cover most of the critical conditions in tracking demonstrate the enhanced strength and robustness of our method. Moreover, in comparison with several state-of -the-art tracking systems, our method shows superior performance in most of cases, especially in long time sequences.
This article [J. Electron. Imaging 25(6), 061602 (2016), doi: 10.1117/1.JEI.25.6.061602] was retracted on 18 December 2018 due to double publication in this and another peer-reviewed journal. The authors regret this mistake.
We propose a simple yet effective method for long-term object tracking. Different from the traditional visual tracking method, which mainly depends on frame-to-frame correspondence, we combine high-level semantic information with low-level correspondences. Our framework is formulated in a confidence selection framework, which allows our system to recover from drift and partly deal with occlusion. To summarize, our algorithm can be roughly decomposed into an initialization stage and a tracking stage. In the initialization stage, an offline detector is trained to get the object appearance information at the category level, which is used for detecting the potential target and initializing the tracking stage. The tracking stage consists of three modules: the online tracking module, detection module, and decision module. A pretrained detector is used for maintaining drift of the online tracker, while the online tracker is used for filtering out false positive detections. A confidence selection mechanism is proposed to optimize the object location based on the online tracker and detection. If the target is lost, the pretrained detector is utilized to reinitialize the whole algorithm when the target is relocated. During experiments, we evaluate our method on several challenging video sequences, and it demonstrates huge improvement compared with detection and online tracking only.
A new flexible method to calibrate the external parameters of two cameras with no-overlapping field of view (FOV) is proposed in our paper. A flexible target with four spheres and a 1D bar is designed. All spheres can move freely along the bar to make sure that each camera can capture the image of two spheres clearly. As the radius of each sphere is known exactly, the center of each sphere under its corresponding camera coordinate system can be confirmed from each sphere projection. The centers of the four spheres are collinear in the process of calibration, so we can express the relationship of the four centers only by external parameters of the two cameras. When the expressions in different positions are obtained, the external parameters of two cameras can be determined. In our proposed calibration method, the center of the sphere can be determined accurately as the sphere projection is not concerned with the sphere orientation, meanwhile, the freely movement of the spheres can ensure the image of spheres clearly. Experiment results show that the proposed calibration method can obtain an acceptable accuracy, the calibrated vision system reaches 0.105 mm when measuring a distance section of 1040 mm. Moreover, the calibration method is efficient, convenient and with an easy operation.
To overcome the drawback that Boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose a fast decision trees training method by pruning those noneffective features in advance. And basing on this method, we also design a fast Boosting decision trees training algorithm. Firstly, we analyze the structure of each decision trees node, and prove that the classification error of each node has a bound through derivation. Then, by using the error boundary to prune non-effective features in the early stage, we greatly accelerate the decision tree training process, and would not affect the training results at all. Finally, the decision tree accelerated training method is integrated into the general Boosting process forming a fast boosting decision trees training algorithm. This algorithm is not a new variant of Boosting, on the contrary, it should be used in conjunction with existing Boosting algorithms to achieve more training acceleration. To test the algorithm’s speedup performance and performance combined with other accelerated algorithms, the original AdaBoost and two typical acceleration algorithms LazyBoost and StochasticBoost were respectively used in conjunction with this algorithm into three fast versions, and their classification performance was tested by using the Lsis face database which contained 12788 images. Experimental results reveal that this fast algorithm can achieve more than double training speedup without affecting the results of the trained classifier, and can be combined with other acceleration algorithms. Key words: Boosting algorithm, decision trees, classifier training, preliminary classification error, face detection
KEYWORDS: 3D image reconstruction, Reconstruction algorithms, 3D modeling, Cameras, Detection and tracking algorithms, 3D image processing, Calibration, Internet, Image processing, Tin
In this paper, we aim to reconstruct 3D points of the scene from related images. Scale Invariant Feature Transform( SIFT) as a feature extraction and matching algorithm has been proposed and improved for years and has been widely used in image alignment and stitching, image recognition and 3D reconstruction. Because of the robustness and reliability of the SIFT’s feature extracting and matching algorithm, we use it to find correspondences between images. Hence, we describe a SIFT-based method to reconstruct 3D sparse points from ordered images. In the process of matching, we make a modification in the process of finding the correct correspondences, and obtain a satisfying matching result. By rejecting the “questioned” points before initial matching could make the final matching more reliable. Given SIFT’s attribute of being invariant to the image scale, rotation, and variable changes in environment, we propose a way to delete the multiple reconstructed points occurred in sequential reconstruction procedure, which improves the accuracy of the reconstruction. By removing the duplicated points, we avoid the possible collapsed situation caused by the inexactly initialization or the error accumulation. The limitation of some cases that all reprojected points are visible at all times also does not exist in our situation. “The small precision” could make a big change when the number of images increases. The paper shows the contrast between the modified algorithm and not. Moreover, we present an approach to evaluate the reconstruction by comparing the reconstructed angle and length ratio with actual value by using a calibration target in the scene. The proposed evaluation method is easy to be carried out and with a great applicable value. Even without the Internet image datasets, we could evaluate our own results. In this paper, the whole algorithm has been tested on several image sequences both on the internet and in our shots.
In this paper we propose a simply yet effective and efficient method for long-term object tracking. Different from traditional visual tracking method which mainly depends on frame-to-frame correspondence, we combine high-level semantic information with low-level correspondences. Our framework is formulated in a confidence selection framework, which allows our system to recover from drift and partly deal with occlusion problem. To summarize, our algorithm can be roughly decomposed in a initialization stage and a tracking stage. In the initialization stage, an offline classifier is trained to get the object appearance information in category level. When the video stream is coming, the pre-trained offline classifier is used for detecting the potential target and initializing the tracking stage. In the tracking stage, it consists of three parts which are online tracking part, offline tracking part and confidence judgment part. Online tracking part captures the specific target appearance information while detection part localizes the object based on the pre-trained offline classifier. Since there is no data dependence between online tracking and offline detection, these two parts are running in parallel to significantly improve the processing speed. A confidence selection mechanism is proposed to optimize the object location. Besides, we also propose a simple mechanism to judge the absence of the object. If the target is lost, the pre-trained offline classifier is utilized to re-initialize the whole algorithm as long as the target is re-located. During experiment, we evaluate our method on several challenging video sequences and demonstrate competitive results.
Homography matrix is a matric representation of the projective relation between the space plane and its corresponding
image plane in computer vision. It is widely used in visual metrology, camera calibration, 3D reconstruction and etc.
Therefore, the accurate estimation of the homography matrix is significant. Here, the quantum-behaved particle swarm
optimization method, which is global convergent, is first introduced into the estimation of homography matrix. When suited
cost function is chosen, enough point correspondences can be utilized to search the optimal homography matrix, which can
make the estimation accurately. For the purpose of evaluating the proposed method, simulations and experiments are
conducted to confirm the feasibility and robustness. The points obtained from the deviated homography matrix are reprojected
to the image plane to evaluate the accuracy. To compare with the proposed method, the Levenberg-Marquardt
method, which is a typical iterative minimization method, is utilized to obtain the homography matrix. Simulations and
experimental results show that the proposed method is reasonable, accurate, and with an excellent robustness. When 10
correspondences and 20 particles are utilized, the root mean square error of the re-projected points can reach about 0.019 mm.
Furthermore, our proposed method is not related with the initialization and less correlated with the chosen cost function,
which is the deficiency of the common estimation methods.
The stereo light microscope (SLM) plays an important role in the measurement of three-dimensional geometry on the microscopic scale. We propose a fast and precise affine calibration algorithm based on the invariable extrinsic parameters for the SLM. This calibration algorithm with a free planar reference consists of three steps: first, derive the extrinsic parameters based on their invariable definition in the pinhole and affine models; second, calculate the intrinsic parameters through homography matrix; finally, refine all the model parameters by global optimization with the previous closed-form solutions as the initial values. The effectiveness of assuming a noncoaxial optical system as an affine camera is also verified to affinely model all types of SLMs. The calibration experiments show that the affine calibration is preferable for multicriteria including running time, relative positioning precision, and absolute positioning precision. With PlanApo S 1.5× and a total magnification of 3.024×, the proposed affine calibration algorithm achieves a distance error of 0.423 μm and a positioning error of 0.195 mm within 10.6 s.
Determining the relative ubiety between the camera and the structured light plane projector is a classical problem in the measurement of line-structured light vision sensors. A geometrical calibration method based on the theory of vanishing points and vanishing lines is proposed. In this method, a planar target with several parallel lines is used. By moving the target to at least two different positions randomly, we can obtain the normal vector of the structured light plane under the camera coordinate system; as the distance of each two adjacent parallel lines has been known exactly, the parameter D of the structured light plane is determined; therefore, the function of the structured light plane can be confirmed. Experimental results show that the accuracy of the proposed calibration method can reach 0.09 mm within the view field of about 200×200 mm . Moreover, the target used in our calibration method can be easily produced precisely, and the calibration method is efficient and convenient as its simple calculation and easy operation, especially for onsite calibration.
We focus on two key problems in the calibration of multi-sensor visual measurement systems based on structured light: the calibration of the structured light vision sensor, and the global calibration of multiple vision sensors. In the calibration of the vision sensor, the light-plane equation is computed by combining the Plucker matrices of light stripes obtained at different target positions. Since the light-plane equation is optimized by using all the light-stripe center points, the robustness and accuracy of calibration are considerably improved. For the global calibration of multiple vision sensors, the relative positions of the two vision sensors with non-overlapping fields of view are calibrated by means of two planar targets (fixed together), using the constraint that the relative positions of the two targets are invariable. The mutual transformations between the two targets need not be known. Using one of the vision sensors as the base vision sensor, the global calibration of multiple vision sensors is achieved by calibrating each of the other vision sensors with the base vision sensor. The proposed method has already been successfully applied in practic.
A global calibration method of multi-sensor vision system based on flexible 3d target is proposed to solve the calibration
problem of multi-sensor vision system with a large inspection range. The flexible 3d target is a form of target consisted
of several planar targets, called sub-targets, which are placed flexibly according to the sensors' positions. The coordinate
frame of one of the vision sensor is selected as the global coordinate frame. Using the invariance of the relative positions
between sub-targets in the flexible 3d target, the closed solution of the transformation from the local coordinate frame of
each sensor to the global coordinate frame can be computed. The result is refined by the nonlinear optimization method,
and maximum likelihood estimation of the translation matrixes can be achieved. Experimental result demonstrates high
accuracy of proposed calibration method. The proposed method greatly simplifies the process and reduces the cost of
global calibration, for it does not need high-accuracy 3d measuring equipment or special 3d target as most the traditional
global calibration methods do, and only needs to combine several planar targets to carry out the global calibration. It is
applicable for the global calibration of multi-sensor vision system at working location.
A novel 3-D terrain matching algorithm is presented for a passive aircraft navigation system. Stereo matching of a pair of overlapping images can yield a recovered digital elevation model (DEM), which can be matched to the airborne reference DEM of the 3-D terrain. The two DEMs can be represented by the compact representation of contour maps, so the terrain matching is converted to contour-map matching. A contour-map matching algorithm using a combination of Fourier transform and polar transform is then proposed to estimate the associated translation and rotation parameters, which provide the desired position and orientation of the aircraft. Experimental results with real terrain data demonstrate that the proposed algorithm is insensitive to large noise and distortion compared to the previous state of the art, and also has the merits of high reliability and accuracy.
For the problem of the impact of slanting installation of PSD used as photoelectric detector on spot position, a
mathematical model of this distortion error of the spot position is established and the simulation is done. Some
conclusions show that the distortion error of the spot position increases with the increasing of slanting angle of PSD
surface, beam waist radius and distance between PSD and beam waist position. The effect on spot positioning precision
of foregoing two can be ignored in a little range, and the last one has great effect. The distortion error model of spot
position and simulative results provide an available theoretical reference for the actually engineering applications of
PSD.
OpenGL is the international standard of 3D image. The 3D image generation by OpenGL is similar to the shoot by
camera. This paper focuses on the application of OpenGL to computer vision, the OpenGL 3D image is regarded as
virtual camera image. Firstly, the imaging mechanism of OpenGL has been analyzed in view of perspective projection
transformation of computer vision camera. Then, the relationship between intrinsic and extrinsic parameters of camera
and function parameters in OpenGL has been analysed, the transformation formulas have been deduced. Thereout the
computer vision simulation has been realized. According to the comparison between the actual CCD camera images and
virtual camera images(the parameters of actual camera are the same as virtual camera's) and the experiment results of
stereo vision 3D reconstruction simulation, the effectiveness of the method with which the intrinsic and extrinsic
parameters of virtual camera based on OpenGL are determined has been verified.
KEYWORDS: Inspection, 3D vision, Distributed interactive simulations, Data centers, 3D applications, Sensors, CCD cameras, Projection systems, 3D modeling, Structured light
Structured light based 3D vision has wide applications in inspecting the form and position errors like straightness and coaxiality of cylindrical workpieces. But for these applications, the light stripe on the workpiece's surface is much too short, and contains inadequate data information, even with some noise. Under such circumstances, the ellipse fitting to the scattered data of the light stripe is not efficient enough, and its fitting accuracy is usually poor. To address this problem, a new least-square fitting method based on the constraint of ellipse minor axis (called CEMA method) is proposed in detail in this paper. Simulations are given for the proposed method and for five other popular methods described in the literature. The results show that the proposed method can efficiently improve the accuracy and the robustness of ellipse fitting to the scattered data of short light stripe.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.