KEYWORDS: Computer programming, Distortion, Gesture recognition, Data compression, Cameras, Pattern recognition, Detection and tracking algorithms, Solids, Video compression, Video
The intention of shape coding in the MPEG-4 is to improve the coding efficiency as well as to facilitate the object-oriented applications, such as shape-based object recognition and retrieval. These require both efficient shape compression and effective shape description. Although these two issues have been intensively investigated in data compression and pattern recognition fields separately, it remains an open problem when both objectives need to be considered together. To achieve high coding gain, the operational rate-distortion optimal framework can be applied, but the direction restriction of the traditional eight-direction edge encoding structure reduces its compression efficiency and description effectiveness. We present two arbitrary direction edge encoding structures to relax this direction restriction. They consist of a sector number, a short component, and a long component, which represent both the direction and the magnitude information of an encoding edge. Experiments on both shape coding and hand gesture recognition validate that our structures can reduce a large number of encoding vertices and save up to 48.9% bits. Besides, the object contours are effectively described and suitable for the object-oriented applications.
KEYWORDS: Radiography, Medical imaging, Image retrieval, Digital imaging, Picture Archiving and Communication System, Image storage, Image processing, Feature extraction, Principal component analysis, Image quality
Before a radiographic image is sent to a picture archiving and communications system (PACS), its projection
information needs to be correctly identified at capture modalities to facilitate image archive and retrieval. Currently,
annotating radiographic images is manually performed by technologists. It is labor intensive and cost ineffective.
Moreover, man-made annotation errors occur frequently during image acquisition. To address this issue, an automatic
image recognition method is developed. It first extracts a set of visual features from the most indicative region in a
radiograph for image recognition, and then uses a family of classifiers, each of which is trained for a specific projection
to determine the most appropriate projection for the image. The method has been tested on a large number of clinical
images and has shown excellent robustness and efficiency.
Extracting key frames (KF) from video is of great interest in many applications, such as video summary, video
organization, video compression, and prints from video. KF extraction is not a new problem. However, current literature
has been focused mainly on sports or news video. In the consumer video space, the biggest challenges for key frame
selection from consumer videos are the unconstrained content and lack of any preimposed structure. In this study, we
conduct ground truth collection of key frames from video clips taken by digital cameras (as opposed to camcorders)
using both first- and third-party judges. The goals of this study are: (1) to create a reference database of video clips
reasonably representative of the consumer video space; (2) to identify associated key frames by which automated
algorithms can be compared and judged for effectiveness; and (3) to uncover the criteria used by both first- and thirdparty
human judges so these criteria can influence algorithm design. The findings from these ground truths will be
discussed.
We present a key frame extraction method dedicated to summarize unstructured consumer video
clips acquired from digital cameras. Analysis of spatio-temporal changes over time provides
meaningful information about the scene and the cameraman's general intents. First, camera and
object motion are estimated and used to derive motion descriptors. A video is segmented into
homogeneous segments based on major types of camera motion (e.g., pan, zoom, pause, steady).
Dedicated rules are used to extract candidate key frames from each segment. Confidence
measures are computed for the candidates to enable ranking in semantic relevance. This method
is scalable so that we can produce any desired number of key frames from the candidates. We
demonstrated the effectiveness of our method by comparing results with the ground truth agreed
by multiple judges.
Have you ever lamented, "I wish I had taken a different picture of that 'Kodak Moment' when everyone was smiling and no one blinked"? With image recomposition, we strive to deliver the "one-click fix" experience to customers so they can easily create the perfect pictures that they never actually took. To accomplish this, a graphic user interface was created that integrates existing and new algorithms, including face detection, facial feature location, face recognition, expression recognition, face appearance and pose matching, and seamless blending. Advanced modes include face relighting. This system is capable of performing image recomposition from a mixture of videos and still photos, with ease of use and a high degree of automation.
Automatic target segmentation is critical to computerized dental imaging systems, which are designed to reduce human effort and error. We have developed an automatic algorithm that is capable of outlining an intra-oral reference bar and the tooth of interest. In particular, the algorithm first locates the reference bar using unique color and shape cues. The located reference bar provides an estimate for the tooth of interest in terms of both its scale and location. Next, the estimate is used to initialize a trained active shape model (ASM) consisting of the bar and the tooth. Finally, a search process is performed to find a match between the ASM and the local image structures. Experimental results have shown that our fully automatic algorithm provides accurate segmentation of both the reference bar and the tooth of interest, and it is insensitive to lighting, tooth color, and tooth-shape variations.
In picture archiving and communications systems (PACS), images need to be displayed in standardized ways for
radiologists' interpretations. However, for most radiographs acquired by computed radiography (CR), digital
radiography (DR), or digitized films, the image orientation is undetermined because of the variation of examination
conditions and patient situations. To address this problem, an automatic orientation correction method is presented. It
first detects the most indicative region for orientation in a radiograph, and then extracts a set of low-level visual features
sensitive to rotation from the region. Based on these features, a trained classifier based on a support vector machine is
employed to recognize the correct orientation of the radiograph and reorient it to a desired position. A large-scale
experiment has been conducted on more than 12,000 radiographs covering a large variety of body parts and projections
to validate the method. The overall performance is quite promising, with the success rate of orientation correction
reaching 95.2%.
KEYWORDS: Scene classification, Multimedia, Image classification, Compact discs, Video, Computer programming, Digital video discs, Image segmentation, Visualization, Classification systems
With the increasing use of digital imaging in general consumer applications, there is a great deal of interest in developing
new products that increase the value and enjoyment level of viewing digital images in consumers' living rooms. One way
to enrich image viewing and sharing is to combine images with voice annotation and music. A picture VCD system
(PVCD) was developed for multimedia authoring, centered around and driven by still photos, with an emphasis on
composing still images with sound, including music and spoken annotations. We describe the overall system, as well as
major enabling technology components, including multimedia authoring, semantic image classification, and cross-media
indexing. The finished multimedia bit stream is primarily recorded on DVD or VCD to facilitate enriched enjoyment
through a TV set, but can also be shared on a desktop/laptop, via email, or online.
We present a method to incorporate nonlinear shape prior constraints into segmenting different anatomical structures in medical images. Kernel space density estimation (KSDE) is used to derive the nonlinear shape statistics and enable building a single model for a class of objects with nonlinearly varying shapes. The object contour is coerced by image-based energy into the correct shape sub-distribution (e.g., left or right lung), without the need for model selection. In contrast to an earlier algorithm that uses a local gradient-descent search (susceptible to local minima), we propose an algorithm that iterates between dynamic programming (DP) and shape regularization. DP is capable of finding an optimal contour in the search space that maximizes a cost function related to the difference between the interior and exterior of the object. To enforce the nonlinear shape prior, we propose two shape regularization methods, global and local regularization. Global regularization is applied after each DP search to move the entire shape vector in the shape space in a gradient descent fashion to the position of probable shapes learned from training. The regularized shape is used as the starting shape for the next iteration. Local regularization is accomplished through modifying the search space of the DP. The modified search space only allows a certain amount of deformation of the local shape from the starting shape. Both regularization methods ensure the consistency between the resulted shape with the training shapes, while still preserving DP’s ability to search over a large range and avoid local minima. Our algorithm was applied to two different segmentation tasks for radiographic images: lung field and clavicle segmentation. Both applications have shown that our method is effective and versatile in segmenting various anatomical structures under prior shape constraints; and it is robust to noise and local minima caused by clutter (e.g., blood vessels) and other similar structures (e.g., ribs). We believe that the proposed algorithm represents a major step in the paradigm shift to object segmentation under nonlinear shape constraints.
We have developed an active shape model (ASM)-based segmentation scheme that uses the original Cootes et al. formulation for the underlying mechanics of the ASM but improves the model by fixating selected nodes at specific structural boundaries called transitional landmarks. Transitional landmarks identify the change from one boundary type (such as lung-field/heart) to another (lung-field/diaphragm). This results in a multi-segmented lung-field boundary where each segment correlates to a specific boundary type (lung-field/heart, lung-field/aorta, lung-field/rib-cage, etc.). The node-specified ASM is built using a fixed set of equally spaced feature nodes for each boundary segment. This allows the nodes to learn local appearance models for a specific boundary type, rather than generalizing over multiple boundary types, which results in a marked improvement in boundary accuracy. In contrast, existing lung-field segmentation algorithms based only on ASM simply space the nodes equally along the entire boundary without specification. We have performed extensive experiments using multiple datasets (public and private) and compared the performance of the proposed scheme with other contour-based methods. Overall, the improved accuracy is 3-5 &percent; over the standard ASM and, more importantly, it corresponds to increased alignment with salient anatomical structures. Furthermore, the automatically generated lung-field masks lead to the same fROC for lung-nodule detection as hand-drawn lung-field masks. The accurate landmarks can be easily used for detecting other structures in the lung field. Based on the related landmarks (mediastinum-heart transition, heart-diaphragm transition), we have extended the work to heart segmentation.
The performance of an exemplar-based scene classification system depends largely on the size and quality of its set of training exemplars, which can be limited in practice. In addition, in non-trivial data sets, variations in scene content as well as distracting regions may exist in many testing images to prohibit good matches with
the exemplars. We introduce the concept of image-transform bootstrapping using image transforms to address such issues. In particular, three major schemes are described for exploiting this concept to augment training, testing, and both. We have successfully applied it to three applications of increasing difficulty: sunset
detection, outdoor scene classification, and automatic image orientation detection. It is shown that appropriate transforms and meta-classification methods can be selected to boost performance according to the domain of the problem and the features/classifier used.
In classic pattern recognition problems, classes are mutually exclusive by definition. Classification errors occur when the classes overlap in the feature space. We examine a different situation, occurring when the classes are, by definition, not mutually exclusive. Such problems arise in scene and document classification and in medical diagnosis. We present a framework to handle such problems and apply it to the problem of semantic scene classification, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels (e.g., a field scene with a mountain in the background). Such a problem poses challenges
to the classic pattern recognition paradigm and demands a different treatment. We discuss approaches for training and testing in this scenario and introduce new metrics for evaluating individual examples, class recall and precision, and overall accuracy. Experiments show that our methods are suitable for scene classification; furthermore, our work appears to generalize to other classification problems of the same nature.
This paper presents a psychophysical study on the perception of image orientation. Some natural images are extremely difficult even for humans to orient correctly or may not even have a "correct" orientation; the study provides an upper bound for the performance of an automatic system. Discrepant detection rates based on only low-level cues have been reported, ranging from exceptionally high in earlier work to more reasonable in recent work. This study allows us to put the reported results in the correct perspective. In addition, the use of a large, carefully chosen image set that spans the "photo space" (in terms of occasions and subject matter) and extensive interaction with the human observers should reveal cues used by humans at various image resolutions. These can be used to design a robust automatic algorithm for orientation detection.
A collection of 1000 images (mix of professional photos and consumer snapshots) is used in this study. Each image is examined by at least five observers and shown at varying resolutions. Object recognition is expected to be more difficult (impossible for some images) at the lowest resolution and easier as the resolution increases. At each resolution, observers are asked to indicate the image orientation, the level of confidence, and the cues they used to make the decision. This study suggests that for typical images, the upper bound on accuracy is close to 98% when using all available semantic cues from high-resolution images and 84% if only low-level vision features and coarse semantics from thumbnails are used. The study also shows that sky and people are the most useful and reliable among a number of important semantic cues.
Sky is among the semantic object classes frequently seen in photographs and useful for image understanding, processing, and retrieval. We propose a novel hybrid approach to sky detection; based on color and texture classification, region extraction, and physics motivated sky signature validation. Sky can be of many different types; clear blue sky, cloudy/overcast sky, mixed sky, and twilight sky, etc. A single model cannot correctly characterize all the various types of skies due to the large difference in physics and appearance associated with different sky types. We have developed a set of physics-motivated sky models to identify clear blue-sky regions and cloudy/overcast sky regions. An exemplar-based approach is to generate the initial set of candidate sky regions. Another data-derived model is subsequently used to combine the results for different sky types to form a more complete sky map. Extensive testing using more than 3000 (randomly oriented) natural images shows that our comprehensive sky detector is able to accurately recall approximately 96% of all sky regions in the image set, with a precision of about 92%. Assuming correct image orientation, the precision on the same set of images increases to about 96%.
Color palettization is the process that converts an input color image having a large set of possible colors to an output color image having a reduced set of palette colors. For example, a typical input 24-bit input image has possibly millions of colors, whereas a typical color palette has only 256 colors. It is desirable to determine the set of palette colors based on the distribution of colors in the input image. Furthermore, it is also desirable to preserve important colors such as human skin tones in the palettized image. We propose a novel scheme to accomplish these goals through supplementing the distribution of input colors by a distribution of selected important colors. In particular, skin color supplementation is achieved by appending to the input image skin tone patches generated from statistical sampling of the skin color probability density function. A major advantage of this scheme is that explicit skin detection, which can be error-prone and time consuming is avoided. In addition, this scheme can be used with any color palettization algorithms. Subjective evaluation has shown the efficacy of this scheme.
We present a computational approach to main subject detection, which provides a measure of saliency or importance for different regions that are associated with different subjects in an image with unconstrained scene content. It is built primarily upon selected image semantics, with low-level vision feature also contributing to the decision. The algorithm consists of region segmentation, perceptual grouping, feature extraction, and probabilistic reasoning. To accommodate the inherent ambiguity in the problem as reflected by the ground truth, we have developed a novel training mechanism for Bayes nets- based on fractional frequency counting. Using a set of images spanning the 'photo space', experimental results have shown the promise of our approach in that most of the regions that independent observers ranked as the main subject are also labeled as such by our system. In addition, our approach lends itself to performance scalable configurations within the Bayes net-based framework. Different applications have different degrees of tolerance to performance degradation and sped aggravation; computing a full set of features may be not practical for time- critical applications. We have designed the algorithm to run under three configurations, without reorganization or retraining of the network.
A consumer photograph, or snapshot, is a medium for conveying to a viewer, one's interest in one or more main subjects. A methodology is presented for collecting ground truth data useful for training and evaluating algorithms designed to automatically detect the main subject of a consumer photograph. For a database of 100 images, 16 observers provided polygonal approximations to the image areas that comprise the main subject. Results from all observer are combined to form a truth image that is considered the ideal result of a main subject detector and is analyzed to determine features for main subject detection (MSD). The collected ground truth shows substantial agreement among third-party observers. It also supports conventional wisdom regarding the likely locations of main subjects and the value of 'people' detection as a cue for main subject detection. Training data is created from the truth images for an MSD framework involving image segmentation, feature detection, and probabilistic reasoning. A proposed method for generating region-based training data can be used to retrain a reasoning engine as segmentation algorithms improve, without further observer involvement. Although the subject matter for consumer photographs ranges from sweeping landscapes to close portraits, identification of the main subject is a meaningful task.
An algorithm is developed to detect collimation regions in computed radiography images. Based on a priori knowledge of the collimation process, the algorithm consists of four major stages of operations: (1) pixel-level detection and classification of collimation boundary transition pixels; (2) line-level delineation of candidate collimation blades; (3) estimation of the most likely partitioning; and (4) region- level determination of the collimation configuration. This algorithm has been tested on a set of 8,436 images, which includes 7,703 single-exposure images and 733 multiple- exposure images. An overall success rate in excess of 99% has been achieved.
In computerized radiography (CR) imaging, collimation is frequently employed to shield the body parts from unnecessary radiation exposure and minimize radiation scattering using x-ray opaque material. The radiation field is therefore the diagnostic region of interest which has been exposed directly to x-rays. We present an image analysis system for the recognition of the collimation, or equivalently, detection of the radiation field. The purpose is to (1) facilitate optimal tone scale enhancement, which can be driven only by the diagnostically useful part of the image data, and (2) minimize the viewing flare caused by the unexposed area. This system consists of three stages of operations: (1) pixel-level detection and classification of collimation boundary transition pixels; (2) line-level delineation of candidate collimation blades; and (3) region- level determination of the collimation configuration. This system has been reduced to practice and tested over 807 images of 11 exam type and a success rate in excess of 99% has been achieved for tone scale enhancement and masking. However, in general, these false negative cases have no significant impact on either tone scale or flare minimization because of the intrinsic nature of the algorithm. Due to the novel design of the system, its computational efficiency lends itself to on-line operations.
In this paper, a novel wavelet-based approach to recover continuous tone images from halftone images is presented. Wavelet decomposition of the halftone image facilitates a series of spatial and frequency selective processing to preserve most of the original image contents while eliminating the halftone noise. Furthermore, optional non- linear filtering can be applied as post-processing stage to create the final aesthetic contone image. This approach lends itself to practical applications since it is independent of parameter estimation and hence universal to all types of halftoned images, including those obtained by scanning printed halftones.
In wireless image communication, image compression is necessary because of the limited channel bandwidth. The associated channel fading, multipath distortion and various channel noises demand that the applicable image compression technique be amenable to noise combating and error correction techniques designed for wireless communication environment. In this study, we adopt a wavelet-based compression scheme for wireless image communication applications. The scheme includes a novel scene adaptive and signal adaptive quantization which results in coherent scene representation. Such representation can be integrated with the inherent layered structure of the wavelet-based approach to provide possibilities for robust protection of bit stream against impulsive and bursty error conditions frequently encountered in wireless communications. To implement the simulation of wireless image communication, we suggest a scheme of error sources modeling based on the analysis of the general characteristics of the wireless channels. This error source model is based on Markov chain process and is used to generate binary bit error patterns to simulate the bursty nature of the wireless channel errors. Once the compressed image bit stream is passed through the simulated channel, errors will occur according to this bit error pattern. Preliminary comparison between JPEG-based wireless image communication and wavelet-based wireless image communication has been made without application of error control and error resilience to either case. The assessment of the performance based on image quality evaluation shows that the wavelet-based approach is promising for wireless communication with the bursty channel characteristics.
Compression of 3D or 4D medical image data has now become imperative for clinical picture archiving and communication systems (PACS), telemedicine and telepresence networks. While lossless compression is often desired, lossy compression techniques are gaining acceptance for medical applications, provided that clinically important information can be preserved in the coding process. We present a comprehensive study of volumetric image compression with three-dimensional wavelet transform, adaptive quantization with 3D spatial constraints, and octave zerotree coding. The volumetric image data is first decomposed using 3D separable wavelet filterbanks. In this study, we adopt a 3-level decomposition to form a 22-band multiresolution pyramid of octree. An adaptive quantization with 3D spatial constraints is then applied to reduce the statistical and psychovisual redundancies in the subbands. Finally, to exploit the dependencies among the quantized subband coefficients resulting from 3D wavelet decomposition, an octave zerotree coding scheme is developed. The proposed volumetric image compression scheme is applied to a set of real CT medical data. Significant coding gain has been achieved that demonstrates the effectiveness of the proposed volumetric image compression scheme for medical as well as other applications.
We present in this paper a study of medical image compression based on an adaptive quantization scheme capable of preserving clinically useful structures appeared in the given images. We believe that how accurate can a compression algorithm preserve these structures is a good measure of image quality after compression since many image-based diagnoses are based on the position and appearance of certain structures. With wavelet decomposition, we are able to investigate the image features at different scale levels that correspond to certain characteristics of biomedical structures contained in the medical images. An adaptive quantization algorithm based on clustering with spatial constraints is then applied to the high frequency subbands. The adaptive quantization enables us to selectively preserve the image features at various scales so that desired details of clinically useful structure are preserved during the process of compression, even at a low bit rate. Preliminary results based on real medical images suggest that this clustering-based adaptive quantization, combined with wavelet decomposition, is very promising for medical image compression with structure-preserving capability.
The Gibbs random field (GRF) has proved to be a simple
and practical way of parameterizing the Markov random field, which has been widely used to model an image or image-related process in many image processing applications. In particular, the GRF can be employed to construct an efficient Bayesian estimation that often yields optimal results. We describe how the GRF can be efficiently incorporated into optimization processes in several representative applications, ranging from image segmentation to image enhancement. One example is the segmentation of computerized tomography (CT) volumetric image sequence in which the GRF has been incorporated into K-means clustering to enforce the neighborhood constraints. Another example is the artifact removal in discrete cosine
transform-based low bit rate image compression where GRF has been used to design an enhancement algorithm that reduces the "blocking effect" and the 'Wnging effect" while still preserving the image details. The third example is the integration of GRF in a wavelet-based subband video coding scheme in which the highfrequency subbands are segmented and quantized with spatial constraints specified by a GRF, and the subsequent enhancement of the decompressed images is accomplished by smoothing with another type of GRF. With these diverse examples, we are able to demonstrate that various features of images can all be properly characterized by a GRF. The specific form of the GRF can be selected according to the characteristics of an individual application. We believe that the GRF is a powerful tool to exploit the spatial dependency in various images, and is applicable to many image processing tasks.
The Gibbs random field (GRF) has been proved to be a simple and practical way of parameterizing the Markov random field which has been widely used to model an image or image related process in may image processing applications. In particular, Gibbs random field can be employed to construct an efficient Bayesian estimation that often yields optimal results. In this paper, we describe how the Gibbs random field can be efficiently incorporated into optimization processes in several representative applications, ranging from image segmentation to image enhancement. One example is the segmentation of CT volumetric image sequence in which the GRF has been incorporated into K-means clustering to enforce the neighborhood constraints. Another example is the artifact removal in DCT based low bit rate image compression in which GRF has been used to design an enhancement algorithm that smooths the artificial block boundary as well as ringing pattern while still preserve the image details. The third example is an elegant integration of GRF into a wavelet subband coding of video signals in which the high-frequency bands are segmented with spatial constraints specified by a GRF while the subsequent enhancement of the decompressed images is accomplished with the smoothing function specified by another corresponding GRF. With these diverse examples, we are able to demonstrated that various features of images can be all properly characterized by a Gibbs random field. The specific form of the Gibbs random field can be selected according to the characteristics of an individual application. We believe that Gibbs random field is a powerful tool to exploit the spatial dependency in various images, and is applicable to many other image processing tasks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.