The paper reports a fully-automated, cross-modality sensor data registration scheme between video and magnetic
tracker data. This registration scheme is intended for use in computerized imaging systems to model the appearance,
structure, and dimension of human anatomy in three dimensions (3D) from endoscopic videos, particularly
colonoscopic videos, for cancer research and clinical practices. The proposed cross-modality calibration procedure
operates this way: Before a colonoscopic procedure, the surgeon inserts a magnetic tracker into the working
channel of the endoscope or otherwise fixes the tracker's position on the scope. The surgeon then maneuvers
the scope-tracker assembly to view a checkerboard calibration pattern from a few different viewpoints for a few
seconds. The calibration procedure is then completed, and the relative pose (translation and rotation) between
the reference frames of the magnetic tracker and the scope is determined. During the colonoscopic procedure, the
readings from the magnetic tracker are used to automatically deduce the pose (both position and orientation)
of the scope's reference frame over time, without complicated image analysis. Knowing the scope movement
over time then allows us to infer the 3D appearance and structure of the organs and tissues in the scene. While
there are other well-established mechanisms for inferring the movement of the camera (scope) from images, they
are often sensitive to mistakes in image analysis, error accumulation, and structure deformation. The proposed
method using a magnetic tracker to establish the camera motion parameters thus provides a robust and efficient
alternative for 3D model construction. Furthermore, the calibration procedure does not require special training
nor use expensive calibration equipment (except for a camera calibration pattern-a checkerboard pattern-that
can be printed on any laser or inkjet printer).
We describe a software system for building three-dimensional (3D) models from colonoscopic videos. The system
is end-to-end in the sense that it takes as input raw image frames-shot during a colon exam-and produces the
3D structure of objects of interest (OOI), such as tumors, polyps, and lesions. We use the structure-from-motion
(SfM) approach in computer vision which analyzes an image sequence in which camera's position and aim vary
relative to the OOI. The varying pose of the camera relative to the OOI induces the motion-parallax effect which
allows 3D depth of the OOI to be inferred. Unlike the traditional SfM system pipeline, our software system
contains many check-and-balance mechanisms to ensure robustness, and the analysis from earlier stages of the
pipeline is used to guide the later processing stages to better handle challenging medical data. The constructed
3D models allow the pathology (growth and change in both structure and appearance) to be monitored over
time.
The ability to detect and match features across multiple views of a scene is a crucial first step in many computer vision
algorithms for dynamic scene analysis. State-of-the-art methods such as SIFT and SURF perform successfully when
applied to typical images taken by a digital camera or camcorder. However, these methods often fail to generate an
acceptable number of features when applied to medical images, because such images usually contain large homogeneous
regions with little color and intensity variation. As a result, tasks like image registration and 3D structure recovery
become difficult or impossible in the medical domain.
This paper presents a scale, rotation and color/illumination invariant feature detector and descriptor for medical
applications. The method incorporates elements of SIFT and SURF while optimizing their performance on medical data.
Based on experiments with various types of medical images, we combined, adjusted, and built on methods and
parameter settings employed in both algorithms. An approximate Hessian based detector is used to locate scale invariant
keypoints and a dominant orientation is assigned to each keypoint using a gradient orientation histogram, providing
rotation invariance. Finally, keypoints are described with an orientation-normalized distribution of gradient responses at
the assigned scale, and the feature vector is normalized for contrast invariance. Experiments show that the algorithm
detects and matches far more features than SIFT and SURF on medical images, with similar error levels.
KEYWORDS: 3D modeling, Cameras, Video, 3D image processing, Computer simulations, Visual process modeling, Process modeling, Image registration, Data modeling, Motion models
3D computer models of body anatomy can have many uses in medical research and clinical practices. This paper
describes a robust method that uses videos of body anatomy to construct multiple, partial 3D structures and
then fuse them to form a larger, more complete computer model using the structure-from-motion framework.
We employ the Double Dog-Leg (DDL) method, a trust-region based nonlinear optimization method, to jointly
optimize the camera motion parameters (rotation and translation) and determine a global scale that all partial
3D structures should agree upon. These optimized motion parameters are used for constructing local structures,
and the global scale is essential for multi-view registration after all these partial structures are built. In order
to provide a good initial guess of the camera movement parameters and outlier free 2D point correspondences
for DDL, we also propose a two-stage scheme where multi-RANSAC with a normalized eight-point algorithm
is first performed and then a few iterations of an over-determined five-point algorithm is used to polish the
results. Our experimental results using colonoscopy video show that the proposed scheme always produces more
accurate outputs than the standard RANSAC scheme. Furthermore, since we have obtained many reliable point
correspondences, time-consuming and error-prone registration methods like the iterative closest points (ICP)
based algorithms can be replaced by a simple rigid-body transformation solver when merging partial structures
into a larger model.
KEYWORDS: 3D modeling, Cameras, 3D image processing, Video, Colon, Solid modeling, Visual process modeling, Motion models, Data modeling, Computing systems
A 3D colon model is an essential component of a computer-aided diagnosis (CAD) system in colonoscopy to
assist surgeons in visualization, and surgical planning and training. This research is thus aimed at developing
the ability to construct a 3D colon model from endoscopic videos (or images). This paper summarizes our ongoing
research in automated model building in colonoscopy. We have developed the mathematical formulations
and algorithms for modeling static, localized 3D anatomic structures within a colon that can be rendered from
multiple novel view points for close scrutiny and precise dimensioning. This ability is useful for the scenario
when a surgeon notices some abnormal tissue growth and wants a close inspection and precise dimensioning. Our
modeling system uses only video images and follows a well-established computer-vision paradigm for image-based
modeling. We extract prominent features from images and establish their correspondences across multiple images
by continuous tracking and discrete matching. We then use these feature correspondences to infer the camera's
movement. The camera motion parameters allow us to rectify images into a standard stereo configuration and
calculate pixel movements (disparity) in these images. The inferred disparity is then used to recover 3D surface
depth. The inferred 3D depth, together with texture information recorded in images, allow us to construct a 3D
model with both structure and appearance information that can be rendered from multiple novel view points.
This paper presents an event sensing paradigm for intelligent event-analysis in a wireless, ad hoc, multi-camera, video surveillance system. In particilar, we present statistical methods that we have developed to support three aspects of event sensing: 1) energy-efficient, resource-conserving, and robust sensor data fusion and analysis, 2) intelligent event modeling and recognition, and 3) rapid deployment, dynamic configuration, and continuous operation of the camera networks. We outline our preliminary results, and discuss future directions that research might take.
In this paper, we propose a new scheme for sensor data fusion in machine vision. The proposed scheme uses Kalman filter as the sensor data integration tool and hierarchical B- spline surface as the recording data structure. Kalman filter is used to obtain statistically optimal estimations of the imaged surface structure based on external sensor measurements. Hierarchical B-spline surface maintains high-order surface derivative continuity, may be adaptively refined, possesses desirable local control property, and is storage efficient. Hence, it is used to record the reconstructed surface structure.
In this paper, we propose a unification framework for three-dimensional shape reconstruction using physically- based models. Most shape-from-X techniques use an “observable” (e.g., disparity, intensity, and texture gradient) and a model, which is based on specific domain knowledge (e.g., triangulation principle, reflectance function, and texture distortion equation) to predict the observable, in 3-D shape reconstruction. We show that all these “observable—prediction-model” types of techniques can be incorporated into our framework of energy constraint on a flexible, deformable image frame. In our algorithm, if the observable does not confirm with that predicted by the corresponding model, a large “error” potential results. The error potential gradient forces the flexible image frame to deform in space. The deformation brings the flexible image frame to “wrap” onto the surface of the imaged 3-D object. Surface reconstruction is thus achieved through a “package wrapping” process by minimizing the discrepancy in the observable and the model prediction. The dynamics of such a wrapping process are governed by the least action principle which is physically correct. A physically-based model is essential in this general shape reconstruction framework because of its capability to recover the desired 3-D shape, to provide an animation sequence of the reconstruction, and to include the regularization principle into the theory.
In this paper, we develop a technique which applies the Green’s theorem to locate edges in images. Our method implements the traditional Laplacian of Gaussian operator over different resolution scales for edge detection. However, only the first derivatives of an image function—not the second-derivative Laplacian operators—are computed in our method. Gaussian kernels of different sizes are convolved with a raw image to generate smoothed images at different resolution scales. The first derivatives are calculated from the smoothed images. Equi-first- derivative pixel pairs in both the x and y directions are located. They are then grouped into closed contours in the derivative maps. The Green’s theorem states that if the Laplacian operator produces a smooth, continuous function, zero-crossings (edge points) will be enclosed in these equi-first-derivative contours. Implementation results show that our technique is capable of locating edges at different scales.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.