This paper presents an imaging approach and sample data for brown-out landing, “zero-zero” fog and smoke and into
water DVE environments using 3D Flash LIDAR Vision Systems.
Autonomous aerial refueling (AAR) is an important capability for an unmanned aerial vehicle (UAV) to increase its
flying range and endurance without increasing its size. This paper presents a novel tracking method that utilizes both 2D
intensity and 3D point-cloud data acquired with a 3D Flash LIDAR sensor to establish relative position and orientation
between the receiver vehicle and drogue during an aerial refueling process. Unlike classic, vision-based sensors, a 3D
Flash LIDAR sensor can provide 3D point-cloud data in real time without motion blur, in the day or night, and is capable
of imaging through fog and clouds. The proposed method segments out the drogue through 2D analysis and estimates
the center of the drogue from 3D point-cloud data for flight trajectory determination. A level-set front propagation
routine is first employed to identify the target of interest and establish its silhouette information. Sufficient domain
knowledge, such as the size of the drogue and the expected operable distance, is integrated into our approach to quickly
eliminate unlikely target candidates. A statistical analysis along with a random sample consensus (RANSAC) is
performed on the target to reduce noise and estimate the center of the drogue after all 3D points on the drogue are
identified. The estimated center and drogue silhouette serve as the seed points to efficiently locate the target in the next
frame.
The paper reports a fully-automated, cross-modality sensor data registration scheme between video and magnetic
tracker data. This registration scheme is intended for use in computerized imaging systems to model the appearance,
structure, and dimension of human anatomy in three dimensions (3D) from endoscopic videos, particularly
colonoscopic videos, for cancer research and clinical practices. The proposed cross-modality calibration procedure
operates this way: Before a colonoscopic procedure, the surgeon inserts a magnetic tracker into the working
channel of the endoscope or otherwise fixes the tracker's position on the scope. The surgeon then maneuvers
the scope-tracker assembly to view a checkerboard calibration pattern from a few different viewpoints for a few
seconds. The calibration procedure is then completed, and the relative pose (translation and rotation) between
the reference frames of the magnetic tracker and the scope is determined. During the colonoscopic procedure, the
readings from the magnetic tracker are used to automatically deduce the pose (both position and orientation)
of the scope's reference frame over time, without complicated image analysis. Knowing the scope movement
over time then allows us to infer the 3D appearance and structure of the organs and tissues in the scene. While
there are other well-established mechanisms for inferring the movement of the camera (scope) from images, they
are often sensitive to mistakes in image analysis, error accumulation, and structure deformation. The proposed
method using a magnetic tracker to establish the camera motion parameters thus provides a robust and efficient
alternative for 3D model construction. Furthermore, the calibration procedure does not require special training
nor use expensive calibration equipment (except for a camera calibration pattern-a checkerboard pattern-that
can be printed on any laser or inkjet printer).
We describe a software system for building three-dimensional (3D) models from colonoscopic videos. The system
is end-to-end in the sense that it takes as input raw image frames-shot during a colon exam-and produces the
3D structure of objects of interest (OOI), such as tumors, polyps, and lesions. We use the structure-from-motion
(SfM) approach in computer vision which analyzes an image sequence in which camera's position and aim vary
relative to the OOI. The varying pose of the camera relative to the OOI induces the motion-parallax effect which
allows 3D depth of the OOI to be inferred. Unlike the traditional SfM system pipeline, our software system
contains many check-and-balance mechanisms to ensure robustness, and the analysis from earlier stages of the
pipeline is used to guide the later processing stages to better handle challenging medical data. The constructed
3D models allow the pathology (growth and change in both structure and appearance) to be monitored over
time.
The ability to detect and match features across multiple views of a scene is a crucial first step in many computer vision
algorithms for dynamic scene analysis. State-of-the-art methods such as SIFT and SURF perform successfully when
applied to typical images taken by a digital camera or camcorder. However, these methods often fail to generate an
acceptable number of features when applied to medical images, because such images usually contain large homogeneous
regions with little color and intensity variation. As a result, tasks like image registration and 3D structure recovery
become difficult or impossible in the medical domain.
This paper presents a scale, rotation and color/illumination invariant feature detector and descriptor for medical
applications. The method incorporates elements of SIFT and SURF while optimizing their performance on medical data.
Based on experiments with various types of medical images, we combined, adjusted, and built on methods and
parameter settings employed in both algorithms. An approximate Hessian based detector is used to locate scale invariant
keypoints and a dominant orientation is assigned to each keypoint using a gradient orientation histogram, providing
rotation invariance. Finally, keypoints are described with an orientation-normalized distribution of gradient responses at
the assigned scale, and the feature vector is normalized for contrast invariance. Experiments show that the algorithm
detects and matches far more features than SIFT and SURF on medical images, with similar error levels.
KEYWORDS: 3D modeling, Cameras, Video, 3D image processing, Computer simulations, Visual process modeling, Process modeling, Image registration, Data modeling, Motion models
3D computer models of body anatomy can have many uses in medical research and clinical practices. This paper
describes a robust method that uses videos of body anatomy to construct multiple, partial 3D structures and
then fuse them to form a larger, more complete computer model using the structure-from-motion framework.
We employ the Double Dog-Leg (DDL) method, a trust-region based nonlinear optimization method, to jointly
optimize the camera motion parameters (rotation and translation) and determine a global scale that all partial
3D structures should agree upon. These optimized motion parameters are used for constructing local structures,
and the global scale is essential for multi-view registration after all these partial structures are built. In order
to provide a good initial guess of the camera movement parameters and outlier free 2D point correspondences
for DDL, we also propose a two-stage scheme where multi-RANSAC with a normalized eight-point algorithm
is first performed and then a few iterations of an over-determined five-point algorithm is used to polish the
results. Our experimental results using colonoscopy video show that the proposed scheme always produces more
accurate outputs than the standard RANSAC scheme. Furthermore, since we have obtained many reliable point
correspondences, time-consuming and error-prone registration methods like the iterative closest points (ICP)
based algorithms can be replaced by a simple rigid-body transformation solver when merging partial structures
into a larger model.
KEYWORDS: Distortion, Endoscopes, 3D modeling, Endoscopy, Visual process modeling, Image segmentation, Calibration, Visualization, Computer aided diagnosis and therapy, Human subjects
Endoscopic images suffer from a fundamental spatial distortion due to the wide angle design of the endoscope lens. This
barrel-type distortion is an obstacle for subsequent Computer Aided Diagnosis (CAD) algorithms and should be
corrected. Various methods and research models for the barrel-type distortion correction have been proposed and
studied. For industrial applications, a stable, robust method with high accuracy is required to calibrate the different types
of endoscopes in an easy of use way. The correction area shall be large enough to cover all the regions that the
physicians need to see. In this paper, we present our endoscope distortion correction procedure which includes data
acquisition, distortion center estimation, distortion coefficients calculation, and look-up table (LUT) generation. We
investigate different polynomial models used for modeling the distortion and propose a new one which provides
correction results with better visual quality. The method has been verified with four types of colonoscopes. The
correction procedure is currently being applied on human subject data and the coefficients are being utilized in a
subsequent 3D reconstruction project of colon.
KEYWORDS: 3D modeling, Cameras, 3D image processing, Video, Colon, Solid modeling, Visual process modeling, Motion models, Data modeling, Computing systems
A 3D colon model is an essential component of a computer-aided diagnosis (CAD) system in colonoscopy to
assist surgeons in visualization, and surgical planning and training. This research is thus aimed at developing
the ability to construct a 3D colon model from endoscopic videos (or images). This paper summarizes our ongoing
research in automated model building in colonoscopy. We have developed the mathematical formulations
and algorithms for modeling static, localized 3D anatomic structures within a colon that can be rendered from
multiple novel view points for close scrutiny and precise dimensioning. This ability is useful for the scenario
when a surgeon notices some abnormal tissue growth and wants a close inspection and precise dimensioning. Our
modeling system uses only video images and follows a well-established computer-vision paradigm for image-based
modeling. We extract prominent features from images and establish their correspondences across multiple images
by continuous tracking and discrete matching. We then use these feature correspondences to infer the camera's
movement. The camera motion parameters allow us to rectify images into a standard stereo configuration and
calculate pixel movements (disparity) in these images. The inferred disparity is then used to recover 3D surface
depth. The inferred 3D depth, together with texture information recorded in images, allow us to construct a 3D
model with both structure and appearance information that can be rendered from multiple novel view points.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.