PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Image quality models usually include a mechanism whereby artifacts are masked by the image acting as a background. Scientific study of visual masking has followed two traditions: contrast masking and noise masking, depending primarily on whether the mask is deterministic or random. In the former tradition, masking is explained by a decrease in the effective gain of the early visual system. In the latter tradition, masking is explained by an increased variance in some internal decision variable. The masking process in image quality models is usually of the gain-control variety, derived from the contrast masking tradition. In this paper we describe a third type of masking, which I call entropy masking, that arises when the mask is deterministic but unfamiliar. Some properties and implication of entropy masking are discussed. We argue that image quality models should incorporate entropy masking, as well as contrast masking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Seven types of masking are discussed: multi-component contrast gain control, one-component transducer saturation, two- component phase inhibition, multiplicative noise, high spatial frequency phase locked interference, stimulus uncertainty, and noise intrusion. In the present vision research community, multi-component contrast gain is gaining in popularity while the one- and two-component masking models are losing adherents. In this paper we take the presently unpopular stance and argue against multi-component gain control models. We have a two-pronged approach. First, we discuss examples where high contrast maskers that overlap the test stimulus in both position and spatial frequency nevertheless produce little masking. Second, we show that alternatives to gain control are still viable, as long as uncertainty and noise intrusion effects are included. Finally, a classification is offered for different types of uncertainty effects that can produce large masking behavior.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image discrimination modes are used to predict the visibility of the difference between two images. Using a four category rating scale method, Rohaly et al. (SPIE Vol. 2411) and Ahumada & Beard (SPIE Vol. 2657) found that image discrimination models can predict target detectability when the background is kept constant, or 'fixed.' In experiment I, we use this same rating scale method and find no difference between 'fixed' and 'random' noise (where the white noise changes from trial to trial). In experiment II, we compare fixed noise and two random noise conditions. Using a two- interval forced-choice procedure, the 'random' noise was either the same or different in the two intervals. Contrary to image discrimination model predictions, the same random noise condition produced greater masking than the 'fixed' noise. This suggests that observers use less efficient target templates than image discrimination models implicitly assume. Also, performance appeared limited by internal process variability rather than external noise variability since similar masking was obtained for both random noise types.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The ability of a human observer to locate a lesion in natural medical image backgrounds (extracted from patients x-ray coronary angiograms) is degraded by two major factors: (1) the noisy variations in the background, (2) the presence of a high contrast complex background (through pattern masking effects). The purpose of this paper is to isolate and model the effect of a deterministic complex background on visual signal detection in natural medical image backgrounds. We perform image discrimination experiments where the observers have to discriminate an image containing the background plus signal from an image containing the background only. Five different samples of medical image backgrounds were extracted from patients' digital x-ray coronary angiograms. On each trial, two images were shown sequentially, one image with the simulated contrast target and the other without. The observer's task was to select the image with the target. An adaptive staircase method was used to determine the sequence of signal contrasts presented and the signal's energy thresholds were determined by maximum likelihood estimation. We tested the ability of single channel and multiple channel image discrimination models with a variety of contrast gain control mechanisms to predict the variation of the signal energy threshold in the different background samples. Human signal energy thresholds were best predicted by a multiple channel model with wide band masking.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Reliable image quality assessments are necessary for evaluating digital imaging methods (halftoning techniques) and products (printers, displays). Typically the quality of the imaging method or product is evaluated by comparing the fidelity of an image before and after processing by the imaging method or product. It is well established that simple approaches like mean squared error do not provide meaningful measures of image fidelity. A number of image fidelity metrics have been developed whose goal was to predict the amount of differences that would be visible to a human observer. In this paper we outline a new model of the human visual system (HVS) and show how this model can be used in image quality assessment. Our model departs from previous approaches in three ways: (1) We use a physiologically and psychophysically plausible Gabor pyramid to model a receptive field decomposition; (2) We use psychophysical experiments that directly assess the percept we wish to model; and (3) We model discrimination performance by using discrimination thresholds instead of detection thresholds. The first psychophysical experiment tested the visual system's sensitivity as a function of spatial frequency, orientation, and average luminance. The second experiment tested the relation between contrast detection and contrast discrimination.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper considers the use of image quality metric for still image compression systems comparison. The peak signal to noise ratio (PSNR) is a commonly used quality metric in image compression. However it is also generally acknowledges that the PSNR is not a good measure for the prediction of perception of the quality of many different types of image from statistical evaluation by panels of observers. Consequently, for still image evaluation, many authors have proposed several distortion measures with the introduction of some human visual characteristics. But one significant problem associated with these metrics is that there is little information on how these measures perform in comparison to each other. The purpose of this paper is to provide a rigorous evaluation of two metrics to assess the quality of compressed images. The compression system used in this evaluation is the classical JPEG coder. Both objective and subjective tests were performed on a 250 natural images database with a panel of experimented and nonexperimented observers. The results are highly correlated with the complexity of the images under study. Nevertheless, a statistical evaluation of these metrics allows us to operate an objective classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Retinal processing is known to condense space, time, and color information into three basic channels known as the rod, magnocellular, and parvoceliular channels. The magnocellular channel executes a spatial band-pass filter in the lower end of the spatial frequency spectrum, and the parvocellular channel executes a spatial band-pass filter in the higher end of the spectrum. In an analogous fashion, conventional wavelet analysis requires separate high-pass and low-pass filtering operations on data. Previous retinal designs have provided these filtering operations seen in natural processors. The rationale for such filters is presented along with concepts for implementing high-speed analog wavelet analyzers. These concepts are built on existing understanding of vision processing and previously demonstrated analog retinal design chips.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A watermark embeds an imperceptible signal into data such as audio, video and images, for a variety of purposes, including captioning and copyright control. In this paper, we first outline the desirable characteristics of digital watermarks. Previous work in digital watermarking is then reviewed. Early work identified redundant properties of an image (or its encoding) that can be modified to encode watermarking information. The early emphasis was on hiding data, since the envisioned applications were not concerned with signal distortions or intentional tampering that might remove a watermark. However, as watermarks are increasingly used for purposes of copyright control, robustness to common signal transformations and resistance to tampering have become important considerations. Researchers have recently recognized the importance of perceptual modeling and the need to embed a signal in perceptually significant regions of an image, especially if the watermark is to survive lossy compression. However, this requirement conflicts with the need for the watermark to be imperceptible. Several recent approaches that address these issues are discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The huge success of the Internet permits the transmission and wide distribution and access of electronic data in an effortless manner. Content providers are faced with the challenge of how to protect their electronic data. This problem has generated a flurry of recent research activity in the area of digital watermarking of electronic content for copyright protection. Unlike the traditional visible watermark found on paper, the challenge here is to introduce a digital watermark that does not alter the perceived quality of the electronic content while being extremely robust to attack. For instance, in the case of image data, editing the picture or illegal tampering should not destroy or alter the watermark. Equally important, the watermark should not alter the perceived visual quality of the image. From a signal processing viewpoint, the two basic requirements for an effective watermarking scheme, robustness and transparency, conflict with each other. We propose a watermarking technique for digital images that is based on utilizing visual models which have been developed in the context of image compression. Specifically, we propose a watermarking scheme where visual models are used to determine image dependent modulation masks for watermark insertion. In other words, for each image we can determine the maximum amount of watermark signal that each portion of the image can tolerate without affecting the visual quality of the image. This allow us to provide the maximum strength watermark which in turn, is extremely robust to common image processing and editing such as JPEG compression, rescaling, and cropping. We have watermarking results in a DCT framework as well as a wavelet framework. The DCT framework allows the direct insertion of watermarks to JPEG -- compressed data whereas the wavelet based scheme provides a framework where we can take advantage of both a local and global approach. Our scheme is shown to provide dramatic improvement over the current state-of-the-art both in terms of transparency and robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The degradation in image quality, produced by compression, can be regarded as multidimensional in nature, involving tone, color, sharpness and noise. By measuring each of these categories separately, and then combining to form a single figure of merit or quality metric, a better understanding of the effects of compression on quality can be achieved. This approach is superior to the use of single overall measures of 'quality,' such as the mean squared error (MSE) measurement of distortion. Tone reproduction is assessed using cascaded transfer functions. Color reproduction is assessed using both the CIE (1976) L*a*b* color difference measure and a color metric, the color reproduction index, based on a proven model of color vision. Image sharpness and noise are evaluated using the modulation transfer function (MTF) and the noise power spectrum (NPS) respectively. These results are combined, together with characteristics of the display system and a 'typical' observer, in an image quality metric based on the square root integral (SQRI) metric. Comparisons are made between the various methods of assessment, and objectively measured quality is correlated with results from subjective scaling experiments. Problems with the application of methods based on linear systems analysis are described and possible solutions suggested.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The main cost of owning a facsimile machine consists of the telephone charges for the communications, thus short transmission times are a key feature for facsimile machines. Similarly, on a packet-routed service such as the Internet, a low number of packets is essential to avoid operator wait times. Concomitantly, the user expectations have increased considerably. In facsimile, the switch from binary to full color increases the data size by a factor of 24. On the Internet, the switch from plain text American Standard Code for Information Interchange (ASCII) encoded files to files marked up in the Hypertext Markup Language (HTML) with ample embedded graphics has increased the size of transactions by several orders of magnitude. A common compressing method for raster files in these applications in the Joint Photographic Experts Group (JPEG) method, because efficient implementations are readily available. In this method the implementors design the discrete quantization tables (DQT) and the Huffman tables (HT) to maximize the compression factor while maintaining the introduced artifacts at the threshold of perceptual detectability. Unfortunately the achieved compression rates are unsatisfactory for applications such as color facsimile and World Wide Web (W3) browsing. We present a design methodology for image-independent DQTs that while producing perceptually lossy data, does not impair the reading performance of users. Combined with a text sharpening algorithm that compensates for scanning device limitations, the methodology presented in this paper allows us to achieve compression ratios near 1:100.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Psychometric testing, in the form of paired sample comparisons, is used to determine tolerances on optical density and color registration settings on the IBM InfoColor 70 printer. In the process, several print quality metrics are evaluated and the data is analyzed in several ways to explore possibilities of conducting psychometric testing with small numbers of participants in conference room environments. Both are shown to be possible. Several conclusions are also drawn about metrics for optical density and color misregistration effects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The human visual system is finely tuned to be able to detect moving objects or patterned stationary objects. For a printed page, this translates into an ability to discern both the intended information content and any other spatial variations. Therefore, human sensitivity to spatial variations is an important consideration in determining the image quality of a document. The open literature describes the visual response to neutral lightness variations as a function of spatial frequency, to color variations on a neutral base color and to color differences between solid patches. A complete representation of human sensitivity to spatial color variation is very complex, yet must reduce to these special cases. This paper explores the more general case of human sensitivity to variation about a non-neutral base color, both on intended uniform areas and on real customer images. There is a peak in our sensitivity to lightness variation at about 2 - 4 cycles/degree (about 0.4 - 0.8 cycles/mm at a normal reading distance of 30 cm) for any base color, but the dependence on spatial frequency varies between neutral and non-neutral base colors. The more structure there is in the image, the less sensitive people are to color non-uniformity within the page. Large areas of halftoned low chroma colors are especially stressful because they require uniform printing of small dots in each of several colors and also because people are most sensitive to color shifts in that regime. Several of these effects are illustrated.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper summarizes the results of a visual psychophysical investigation of the relationship between two important printer parameters: addressability (expressed in terms of dots per inch or DPI) and grayscale capability (expressed in terms of the number of graylevels per pixel). The photographic image quality of print output increases with both the printer DPI and the number of graylevels per pixel. The experiments described in this paper address the following questions: At what point is there no longer a perceptual advantage of DPI or graylevels, and how do these two parameters tradeoff?
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A tool written in MatLab is described that can be used to simulate display devices. Images as they would appear on the simulated display can be rendered on a workstation screen for direct visualization of the simulated data. The tool also produces a two-dimensional matrix of CIE XYZ vectors that corresponds to the photometric measure of the simulated display with an image rendered on it. This tool is part of a larger effort to build a CAD tool for flat panel display design and optimization. Use of the tool is illustrated with examples of multi-level error diffusion. An empirical experiment is described comparing the predictions of the CAD tool set to actual human performance. The system is found to be consistent with human psychophysics and useful for device design and optimization.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we describe a visual experiment to measure the contrast detection threshold of both halftone image and continuous tone image. A continuous tone sinusoidal grating was halftoned with a classical 45 degree dot screen. A calibrated CRT monitor was used to display images. The observers were asked to make a forced choice that whether the displayed image contains a grating pattern. The contrast detection threshold was determined using Probit analysis. The threshold elevation, the ratio of contrast threshold for halftone grating to continuous tone grating, was calculated based on the measured contrast detection threshold. It was found that the threshold elevation strongly depends on halftone dot frequency. At a high halftone frequency, there is little difference in the measured contrast detection threshold between continuous tone grating and halftone grating, but at a lower halftone frequency, the detection threshold is significantly higher for halftone grating than that of the continuous tone grating. The threshold elevation is much higher for the gratings oriented at 45 degree where the peaks of the halftone frequency lies. A multiple channel vision model was implemented to predict the visual difference for both continuous tone and halftone image. The model correctly predicted the grating detection threshold of continuous tone grating, but it fails to predict the threshold elevation due to halftone.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, we propose an evaluation method for digital halftone image produced by multi-level error diffusion technique. Tone reproduction, sharpness, graininess and color reproduction characteristics of the digital halftoning images with the resolution 200, 400, 800, 1600 DPI printed on transparency by digital film recorder were analyzed both physically and subjectively. We also analyzed the relationship between the subjective evaluation and parameter calculated by linear combination of four physical parameters to evaluate image quality. As a result of the experiment, it was found that the rms granularity and parameter calculated by linear combination of each quality criterion were well correlated to the observer rating values.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The point of regard is usually considered to be the place where visual attention is directed although there are some exceptions. The identification of the point of regard is easy. A variety of commercial devices are available. The movement of the eye from one place to another stems from two quite different mechanisms. One is driven by events in the visual scene; the other by events in the observer. In common parlance the one involves the attracting of attention; the other the paying of attention. The former has been more thoroughly studied since the driving events can be easily identified and quantified. The latter is largely inferred, although on strong evidence. In both cases quantitative models have been constructed. One can predict very well the statistics of successive fixations on dynamic displays. The treatment of static displays is more difficult because the internal mechanisms are difficult to demonstrate. Nonetheless a single model covering both can be constructed. I shall show how the models and experimental data can be converted into rules for controlling and predicting visual attention on a display.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper describes an attempt to broaden the scope of conventional block-based DCT image coding, by allowing some variation in quantization of DCT values for different blocks in the image. Blocks belonging to each individual spatial region in the image are afforded a particular level of quantization selected for that region. This approach allows desirable levels of compression efficiency and visual quality to be achieved for different parts of the reconstructed image. The scheme offers improved overall balancing of fidelity performance, as well as benefits for applications which undertake further processing using the image regions, such as video coding.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A network of primary visual cortex simple cells has been modeled to respond to varying degrees of segment orientation contained within an input image. Although these cells receive a significant receptive field input from a grid of model lateral geniculate nucleus (LGN) cells, they also receive inputs from both long-range excitatory and short-range inhibitory lateral connections made with adjacent simple cells. These cortical interactions have the effect of enhancing stronger signals, filling-in missing or incomplete image information, and reducing locally connected noise. This filtering process is facilitated by the modification of the LGN cell responses, each receiving both positive and negative feedback from retinotopically located cortical simple cells. In addition to the aforementioned filtering, adjusting the levels of cortical feedback, as well as the influence of top- down signals, can also produce attentional effects. Simulations with noisy and incomplete images are used to demonstrate the performance of the network model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper presents an experiment in using visual techniques, called 'masks,' to direct a user's attention within a computer graphical user interface. Bleaching, darkening, and a solid- color pattern overlay (screening) are used to de-emphasize background material, causing the target to visually 'pop-out' at the user. The tradeoff between effectively directing the user's attention and ensuring the readability of the background material is explored. Experimental results indicate that there is a wide range of darkening and screening levels that can create a pop-out effect without degrading the readability of the masked area.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The human visual system engages a wide field of view peripheral vision in conjunction with selectively scanned high resolution foveal vision. The effective scene resolution of the human eye is equivalent to a camera with 108 pixels. This performance is difficult if not impossible to match with available camera technologies. Canpolar East has recently developed a machine vision system that utilizes a low resolution wide field camera plus a high resolution narrow field camera that is able to fixate at 30/60 frames per second. The system was specifically designed to match human visual performance in industrial inspection tasks. The system includes software that selects objects of interest from the low resolution images for high resolution imaging. The system is capable of selection and fixation at about 25 'saccades' per second.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper introduces an automatic tool able to analyze the picture according to the semantic interest an observer attributes to its content. Its aim is to give a 'level of interest' to the distinct areas of the picture extracted by any segmentation tool. For the purpose of dealing with semantic interpretation of images, a single criterion is clearly insufficient because the human brain, due to its a priori knowledge and its huge memory of real-world concrete scenes, combines different subjective criteria in order to assess its final decision. The developed method permits such combination through a model using assumptions to express some general subjective criteria. Fuzzy logic enables the user to encode knowledge in a form that is very close the way experts think about the decision process. This fuzzy modeling is also well suited to represent multiple collaborating or even conflicting experts opinions. Actually, the assumptions are verified through a non-hierarchical strategy that considers them in a random order, each partial result contributing to the final one. Presented results prove that the tool is effective for a wide range of natural pictures. It is versatile and flexible in that it can be used stand-alone or can take into account any a priori knowledge about the scene.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the optimization of digital imaging systems it is crucial to known how parameter settings affect the perceptual quality of displayed images. This calls for valid techniques for assessing image quality. Here, we studied continuous assessment of the instantaneous quality impression of long image sequences. Initially, we concentrated on the measuring method. Subjects were instructed to indicate quality by mooing a slider along a graphical scale. With a sequence consisting of time-variably blurred stills the temporal characteristics of continuous scaling could be separated from the relation between blur and quality impression. The temporal behavior can be explained by a causal linear time-filter. Subsequently, we extended the method to real video. In order to check the validity of continuous scaling, perceived quality of the video at any moment in time was measured by partitioning the video in short fragments and evaluating the quality of each fragment separately. The image material was MPEG-2 coded at 2 Mbit/s. The relation between the time-quality curves from the continuous assessment and the instantaneous ratings of the fragments is described by the same time-filter as found previously. This filter indicates a delay of 1 second, and suggests that subjects can monitor image quality variations almost instantaneously. With these experiments, we have shown that it is possible to measure quality of video sequences continuously in a consistent way. As confirmed in a third experiment, the results of continuous assessment give the possibility to select relevant material for further analysis, for instance by standard ITU/R methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a new model for the prediction of distortion visibility in digital image sequences, which is aimed at use in digital video compression algorithms. The model is an extension of our spatial vision model with a spatio-temporal contrast sensitivity function and an eye movement estimation algorithm. Due to the importance of smooth pursuit eye movements when viewing image sequences, eye movements cannot be neglected in a spatio-temporal vision model. Although eye movements can be incorporated by motion compensation of the contrast sensitivity function, the requirements for this motion compensation are different than those for motion compensated prediction in video coding. We propose an algorithm for the estimation of smooth pursuit eye movements, under the worst-case assumption that the observer is capable of tracking all objects in the image.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Lossy video compression systems such as MPEG2 introduce picture impairments such as image blocking, color distortion and persistent color fragments, 'mosquito noise,' and blurring in their outputs. While there are video test clips which exhibit one or more of these distortions upon coding, there is need of a set of well-characterized test patterns and video quality metrics. Digital test patterns can deliver calibrated stresses to specific features of the encoder, much as the test patterns for analog video stress critical characteristics of that system. Metrics quantify the error effects of compression by a computation. NIST is developing such test patterns and metrics for compression rates that typically introduce perceptually negligible artifacts, i.e. for high quality video. The test patterns are designed for subjective and objective evaluation. The test patterns include a family of computer-generated spinning wheels to stress luminance-based macro-block motion estimation algorithms and images with strongly directional high-frequency content to stress quantization algorithms. In this paper we discuss the spinning wheel test pattern. It has been encoded at a variety of bit rates near the threshold for the perception of impairments. We have observed that impairment perceptibility depends on the local contrast. For the spinning wheel we report the contrast at the threshold for perception of impairments as a function of the bit rate. To quantify perceptual image blocking we have developed a metric which detects 'flats:' image blocks of constant (or near constant) luminance. The effectiveness of this metric is appraised.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This is an update from our ongoing studies on a biologically inspired, digital color image representation presented here before. It has evolved into a technology named ChromaplexTM, with practical applications for both static (still pictures) and dynamic (digital video and motion pictures) images in three areas: (1) Simplification of processing, storage, compression, and transmission of digital color images. (2) Economical full-color upgrading of black and white (gray scale) image capturing systems. (3) Increase up to 4X of spatial resolution in high-quality digital image capturing systems currently designed for triplane color capture (three separate CCDs or three scans). Sample images of these applications are available on a world wide web site. In this paper we present data showing that spatial blur artifacts are worse when produced by conventional techniques of color interpolation than those produced by Chromaplex decoding. We also show that Chromaplex color decoding of CCD outputs, first demonstrated for CCDs with relatively narrow-band RGB filters, is equally applicable to digital imaging systems having CCDs with broad band subtractive color filters (like cyan, yellow, and magenta), but the often necessary color transformation from subtractive color to RGB brings in different tradeoffs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The term error diffusion has been used in the halftoning literature to describe processes in which pixels' quantization errors are spread in space to their unquantized neighbors, causing neighboring errors to be negatively correlated and relatively invisible. The general principle may be extended to the time dimension as well, which we will refer to as temporal error diffusion. In this paper we consider the use of temporal error diffusion to ameliorate the errors introduced by JPEG image compression of a stream of images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image quality of stereoscopic video sequences were assessed by non-expert viewers, using a subjective assessment method, as described in ITU-R Recommendation 500. Three 10-second stereo sequences, in the ITU-R 601 format, were assessed. Using a standard MPEG-2 codec, images of the left-eye views of each sequence were compressed independently of the right-eye views, at bit rates of 1, 2, 3, and 6 Mb/s. Twenty-six viewers rated the overall image quality for various combinations of the compressed left- and right-eye views. Viewers also rated non- stereo sequences, which consisted of images of the right-eye views for both eyes. Ratings of overall image quality were between 60 - 70 units, corresponding to a label of 'good,' except when the severity of artifacts presented to one eye was large, i.e., when the bit rate was below 3 Mb/s. When there was a mismatch in quality of inputs to the two eyes, ratings of overall perceived quality fell halfway between ratings of quality for the left-eye input and the right-eye input. Interestingly, ratings of image quality for stereo sequences were equal to non-stereo sequences, except at the lowest bit rate tested (1 Mb/s).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Given the great diversity of pathways into which the visual system signal splits after arriving at area V1, many researchers have proposed solutions to the 'binding problem' of reunifying the information after specialized processing. Most solutions require pathways to maintain synchrony and share information, which in turn requires some similarity of mechanisms and/or the spaces in which they operate. We examine the extent to which such similarity can occur between motion and color processing pathways, by using a multiple stage motion detection algorithm for processing color change. We first review the motion algorithm chosen, then we present a model for certain changes in hue, discuss the possible uses for such processes in the visual system, and present results of applying this model to both motion and color in this manner.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Children's drawings typically bare little resemblance to conventional perspective projections. It is suggested that their deviation from 'photo-realism' is due in part to the emerging relationship between perception and action. Thus a young child's drawing of the world is unlike a passive mechanical projection because the process of drawing is intimately related to the needs of developing action within the world. Here this relationship is explored through a proposed computer program called Eor (emergence of representation) that endeavors, in a simplified way, to learn to draw like a young child. Eor is motivated by the goal of transferring qualities into its drawing. Each drawing Eor generates is perceived by the program and categorized according to Eor's history. This categorization embraces the perception of the drawing and the action that generated it. This lets Eor generate the appropriate drawing action to represent particular quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer users often feel difficulty in operating. For this reason, it is desirable to make the computer detect the operator's perplex situations and carry out the kindness action to the operator. We propose the method for recognizing the perplex situations in word processor work. Firstly, the perplexed behaviors were picked up by observing the subjects in word processor work. As the result of the observation, it was made clear that the perplexed behaviors were shown in the motion of the head and keyboard operation rather than facial expression. Secondly, the head motions in x, y, z and (Theta) directions were captured in real time by chasing both pupils through the use of image processing. In the system, the processing speed is enhanced by reducing image data. The behaviors are converted to the series of vectors which have the moving speed of head. Thirdly, the distances between unknown input motion patterns and the template patterns of the vector sequences were calculated by DP matching. As the result of DP matching, it was made clear that each head motion was identified. The proposed method can be applied to the development of the software which responses automatically when the operator falls into the perplex situations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We are developing an adjustment method of real-time computer graphics, to obtain effective ones which give audience various senses intended by producer, utilizing human sensibility technologically. Generally, production of real-time computer graphics needs much adjustment of various parameters, such as 3D object models/their motions/attributes/view angle/parallax etc., in order that the graphics gives audience superior effects as reality of materials, sense of experience and so on. And it is also known it costs much to adjust such various parameters by trial and error. A graphics producer often evaluates his graphics to improve it. For example, it may lack 'sense of speed' or be necessary to be given more 'sense of settle down,' to improve it. On the other hand, we can know how the parameters in computer graphics affect such senses by means of statistically analyzing several samples of computer graphics which provide different senses. We paid attention to these two facts, so that we designed an adjustment method of the parameters by inputting phases of sense into a computer. By the way of using this method, it becomes possible to adjust real-time computer graphics more effectively than by conventional way of trial and error.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer Vision Approaches to Characterizing Images
Radial sinusoids (blurry spoke patterns) appear dramatically saturated toward the brighter regions. The saturation is not perceptually logarithmic but exhibits a hyperbolic (Naka- Rushton) compression behavior at normal indoor luminance levels. The object interpretation of the spoke patterns was not consistent with the default assumption of any unidirectional light source, but implied a diffuse illumination (as if the object were looming out of a fog). The depth interpretation was consistent with the hypothesis that the compressed brightness profile provided the neural signal for perceived shape, as an approximation to computing the diffuse Lambertian illumination function for this surface. The surface material of the images was perceived as non-Lambertian to varying degrees, ranging from a chalky matte to a lustrous metallic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurately conveying three-dimensional shape is a requirement for many computer graphic applications. The general belief that we call the photorealism assumption asserts that understanding of depicted object shapes improves with increasing realism of the display. We contend that parameters determining a graphic rendering vary widely in importance for accurately portraying shape. By identifying the relative contributions of rendering parameters, we provide a framework for developing cost-effective display systems. Our current work concerns the role of light direction in conveying the three-dimensional shape of depicted objects. In our experiments, we present sequences of displays that each show an elongated superquadric object. An observer attempts to reproduce the shape of the cross section orthogonal to the elongated axis by adjusting a sample contour. The object rotates continuously so that observers solve the task from an overall understanding of the shape, rather than from static two-dimensional features. We vary the shapes of the superquadric objects, as well as the illumination directions. Our results indicate that the accuracy of observer's estimates is surprisingly robust to these variations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Observers can separate the color properties of a transparent filter from the color properties of underlying surfaces. The chromatic changes which elicit an impression of transparency include translations and convergences in color space. Neither rotations nor shears in color space lead to perceived transparency. Results of matching experiments show that isoluminant translations, which cannot be generated by episcotister or filter models, give rise to the perception of transparency. This implies that systematic luminance variation is not needed for transparency to be perceived. We describe here an algorithm for detecting transparency using graphs. A circuit through an image's X-junctions which is both spatially and chromatically coherent is identified as a contour of at transparent region.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Photometric stereo (PMS) recovers orientation vectors from a set of graylevel images. Under orthography, when the lights are unknown, and for a single uniform Lambertian surface, one can recover surface normals up to an unknown overall orthogonal transformation. The same situation obtains if, instead of three graylevel images, one uses a single RGB image taken with at least three point or extended colored lights impinging on the surface at once. Then using a robust technique and the constraints among the resulting three effective lighting vectors one can recover effective lights as well as normals, with no unknown rotation. However, in the case of a non-Lambertian object, PMS reduces to the idea of using a lookup table (LUT) based on a calibration sphere. Here, we show that a LUT can also be used in the many-colored- lights paradigm, eliminating the need for three separate images as in standard PMS. As well, we show how to transform a calibration sphere made of a particular material into a theoretical sphere for a cognate material similar in its specular properties but of a different color. In particular, we postulate that a LUT developed from one human's skin can be used for any other person; problems arising from shadows, hair, eyes, etc. are automatically eliminated using robust statistics. Results are shown using both synthetic and real images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During rectilinear locomotion, the focus of expansion identifies the observer's heading. Eye rotation or curvilinear locomotion annihilates this singularity. Computer simulations of eye rotation during circular translation present possible solutions to heading judgement. Specifically, when gaze direction coincides with circular heading, every velocity vector in the image plane becomes linearized. These velocity vectors are tangent vectors of the corresponding flow lines of optical flow. Moreover, the vectors corresponding to the observer's path are all aligned perpendicularly in the image plane, which in turn can be used to determine the observer's path of locomotion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This article proposes a general theory and methodology, called the minimax entropy principle, for building statistical models for images (or signals) in a variety of applications. This principle consists of two parts. (1) Maximum entropy principle for feature binding (or fusion): for a given set of observed feature statistics, a distribution can be built to bind these feature statistics together by maximizing the entropy over all distributions that reproduce these feature statistics. The second part is the minimum entropy principle for feature selection: among all plausible sets of feature statistics, we choose the set whose maximum entropy distribution has the minimum entropy. Computational and inferential issues in both parts are addressed. The minimax entropy principle is then corrected by considering the sample variation in the observed feature statistics, and a novel information criterion is derived for feature selection. The minimax entropy principle is applied to texture modeling. Relationship between our theory and the mechanisms of neural computation is also discussed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The classic Gestalt laws of grouping by proximity and grouping by similarity have never been quantified, nor have their interactions been specified with any degree of precision. I will present a number of experimental and theoretical results from my lab, based on the responses of human observers to repeated brief (approximately 300 ms) presentations of perceptually multi-stable periodic dot patterns. I will show that the distribution of the probability of seeing any one of the possible perceptual interpretations is: (1) scale invariant, (2) changes little with reductions in duration to 100 ms, (3) can be predicted exactly on the assumption that dots are attracted to group with each other by a force that decays exponentially with the distance between them. I will also show that factors such the non-uniformity of lightness of dots in the lattice interact additively with the strength of grouping by proximity. I will also show how the fact that grouping is a hierarchical process expresses itself within the framework of the proposed model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present three experiments in which observers were asked to interpolate parabolic sampled contours. In the first two experiments, observers saw only eight isolated sample points, all of which lay on or near an otherwise invisible contour. The observer adjusted the position of a ninth point until s/he judged it to be on the contour as well. We measured the effect of small perturbations in the locations of each of the visible points on the observer's setting and derived a locally linear 'receptive field' that characterizes how each of the visible points contributed to the interpolation judgement. In the third experiment, we develop a measure of segmentation performance based on the same methods and analyses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This study reports on experiments conducted with human observers to investigate the properties of linear and non- linear, perceptual grouping mechanisms by using reverse- polarity sparse random-dot patterns. The stimuli were generated by spatially superimposing a sparse set of randomly distributed square elements onto a copy of the original set that was expanded or rotated about the center of the screen. In the control experiment both the original and transformed sets contained elements of identical luminance contrast with the background. The main experiments involved a reverse- contrast random-dot pattern, in which the transformed set consisted of elements of equal contrast magnitude but opposite polarity to that of the original set. At least two competing global percepts are possible: 'forward grouping' in which perceived grouping agrees with the physical transformation; or 'reverse grouping' in a direction orthogonal to that of the 'forward grouping.' The two-alternative forced-choice (2AFC) task was to report the direction of the global grouping. For the control experiment, the observers reported forward grouping both at the fovea and eccentricities of up to 4 degrees; as expected, no reverse grouping was observed. With the reverse-polarity stimulus, reverse grouping was observed at high eccentricities and low contrasts, but forward grouping dominated under foveal viewing. In another experiment, the influence of chromatic mechanisms was studied by using opposite-contrast red elements on a yellow background. In this experiment reverse grouping was not observed, which indicates that color mechanisms veto reverse grouping. Reverse grouping can be hypothesized to be the result of processing by linear oriented spatial mechanisms, in analogy with reverse-phi motion. Forward grouping, on the other hand, can be explained by non-linear preprocessing (such s squaring or full-wave rectification).
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Spatial point pattern recognition is a frequent step, sometimes the last one, in a general pattern recognition process. Some techniques have been devised to this purpose, generally based on graphs. From statistical geometry considerations we demonstrate the optimal graph representation to be the minimal spanning tree one. The minimal spanning tree (MST) is a graph which provides several ways to analyze the topography (spatial relationships) of objects sets: global degree of order (the so-called m-(sigma) diagram), hierarchical classification (single linkage cluster analysis), non- hierarchical pattern recognition (by graph theory or anisotropy diagrams). The statistical geometry derivation, based on the maximum entropy principle, leads as well to estimate the allowed compression rate of information by using this graph. Anyway the rightist test of an information compression quality is to compare the original pattern to the retrieved one. We have thus investigated various ways to reconstruct those patterns from informations derived, with various compression levels, from the MST. Among them one of the most promising (figure) is the simulated annealing technique with parameters related to the statistical geometry of the graph. Starting from the hypothesis that the analysis of the spatial patterns of objects may lead to display and determine the interactions and control processes between the objects which have induced those patterns, the MST is well suited to analyze these interactions simultaneously at the local and global levels. The method has been applied to the analysis of physical as well as biological systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Halftoning or quantizing by means of a threshold array is simple, fast, and easily parallelized: a matrix of threshold values is tiled across the image and each output pixel is colored white if the image value exceeds the threshold value and is colored black otherwise. The computational efficiency and locality of the compare operation makes this technique suitable for applications in printing and motion video quantization. In the past, threshold arrays have generally been used with regular-appearing patterns such as clustered- dot 'classical' halftoning or Bayer's dispersed-dot patterns. Ulichney has presented a heuristic method for generating blue- noise threshold arrays which do not appear regular, and offer the visual advantages of error-diffusion without its computational costs. Such heuristic methods are capable of generating high-quality threshold arrays, but they are not flexible or controllable enough to enable tuning for particular applications or output device characteristics. We present instead a genetic method for generating a blue-noise threshold array that optimizes a set of criteria encoded in a fitness function, which can be specified to reward any desired attributes. Although the genetic method is computationally intensive, the cost is incurred only once, and the resulting array can be used for millions of images. We compare images halftoned using our arrays with other blue-noise array and error-diffusion methods, and examine the spectral characteristics of the resulting patterns.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Many business data applications involve several hierarchies reflecting inherent structure of the underlying domains. Often, these hierarchies are presented in a spreadsheet like format, showing a single hierarchy in a tabular form with drill down and roll up capabilities. Although this approach works for simple hierarchies and small data sets, they tend to break as the hierarchies become more complex or the size of data set increases. An alternative approach, which has the potential to work equally well with complex hierarchies and large data sets, is to devise a visualization front-end for dynamically altering and displaying views of data under user control. Toward this end, we have developed a prototype of a visualization tool to view large financial data sets which typically involve multiple hierarchies. The tool not only acts as a data presentation medium, but also serves as a graphical means to form complex queries and interact with data. Its presentation structure uses hierarchical labels, which may be drilled down or rolled up, and displayable cell objects, which may have arbitrary complexity. Its data interface allows users to select data of interest, map data parameters to visual attributes, set threshold values to focus on interesting data, and read data values associated with cell objects.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Object Similarity, Characterization, and Retrieval
Outline shape carries a substantial part of the information present in an object view, and is more economical than classical representations such as geon-structural-descriptions or multiple-views. We demonstrate the utility of silhouette representations for a variety of visual tasks, ranging from basic-level categorization to finding the best view of an object. All these tasks necessitate the computation of silhouette similarity. We present an algorithm for estimating silhouette similarity and apply it to a number of simple but realistic vision problems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we present an image similarity metric for content-based image database search. The similarity metric is based on a multiscale model of the human visual system. This multiscale model includes channels which account for perceptual phenomena such as color, contrast, color-contrast and orientation selectivity. From these channels, we extract features and then form an aggregate measure of similarity using a weighted linear combination of the feature differences. The choice of features and weights is made to maximize the consistency with similarity ratings made by human subjects. In particular, we use a visual test to collect experimental image matching data. We then define a cost function relating the distances computed by the metric to the choices made by the human subject. The results indicate that features corresponding to contrast, color-contrast and orientation can significantly improve search performance. Furthermore, the systematic optimization and evaluation strategy using the visual test is a general tool for designing and evaluating image similarity metrics.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The goal of image retrieval is to retrieve images 'similar' to a given query image by comparing the query and database using visual attributes like color, texture and appearance. In this paper, we discuss how to characterize appearance and use it for image retrieval. Visual appearance is represented by the outputs of a set of Gaussian derivative filters applied to an image. These outputs are computed off-line and stored in a database. A query is created by outlining portions of the query image deemed useful for retrieval by the user (this may be changed interactively depending on the results). The query is also filtered with Gaussian derivatives and these outputs are compared with those from the database. The images in the database are ranked on the basis of this comparison. The technique has been experimentally tested on a database of 1600 images which includes a variety of images. The system does not require prior segmentation of the database. Objects can be embedded in arbitrary backgrounds. The system handles a range of size variations and viewpoint variations up to 20 or 25 degrees.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Currently there are quite a few image retrieval systems that use color and texture as features to search images. However, by using global features these methods retrieve results that often do not make much perceptual sense. It is necessary to constrain the feature extraction within homogeneous regions, so that the relevant information within these regions can be well represented. This paper describes our recent work on developing an image segmentation algorithm which is useful for processing large and diverse collections of image data. A compact color feature representation which is more appropriate for these segmented regions is also proposed. By using the color and texture features and a region-based search, we achieve a very good retrieval performance compared to the entire image based search.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Before make computers can be active collaborators in design work, they must be equipped with some human-like visual and design skills. Towards this end, we report some advances in integrating computer vision and automated design in a computational model of 'artistic vision' -- the ability to see something striking in a subject and express it in a creative design. The artificial artist studies images of animals, then designs sculpture that conveys something of the strength, tension, and expression in the animals' bodies. It performs an anatomical analysis using conventional computer vision techniques constrained by high-level causal inference to find significant areas of the body, e.g., joints under stress. The sculptural form -- kinetic mobiles -- presents a number of mechanical and aesthetic design challenges, which the system solves in imagery using field-based computing methods. Coupled potential fields simultaneously enforce soft and hard constraints -- e.g., the mobile should resemble the original animal and every subassembly of the mobile must be precisely balanced. The system uses iconic representations in all stages, obviating the need to translate between spatial and predicate representations and allowing a rich flow of information between vision and design.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Next generation content-based retrieval systems for image and multimedia databases will benefit from utilizing higher level models of human visual processing. This includes incorporating models of early vision as well as more specialized areas like the IT cortex, which is thought to be important in object recognition. Artistic representation is typically based on abstraction of visual content in images. Analogies of various modes of artistic representation can be see in scientific investigations of the visual system. These two observations suggest that an examination of traditional artistic representation may aid in constructing robust feature spaces for content abstraction in image retrieval. In addition, artistic renderings can be used to test the performance of models of image similarity in existing content-retrieval systems.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The properties of spatial vision mechanisms are often explored psychophysically with simultaneous masking paradigms. A variety of hypotheses have been proposed to explain how the mask pattern utilized in these paradigms increases threshold. Numerous studies have investigated the properties of a particular origin of masking hypothesis but few have attempted to compare the properties of masking at several points in the process. Our study isolates masking due to lateral divisive inhibition at a point where mechanism responses are combined, and compares it with masking of the same target due to a nonlinearity either intrinsic to a mechanism or directly operating on the response of a single mechanism. We also measure the slopes of psychometric functions to examine the relationship between uncertainty and mask contrast. Studies of simultaneous masking utilizing a pedestal mask (an identical test and mask pattern) have measured facilitation for low contrast masks. This decrease in threshold from the solo target threshold is commonly referred to as the 'dipper' effect and has been explained as an increase in signal-to- noise ratio from the high unmasked level occurring as the visual system becomes more certain of target location. The level of uncertainty is indicated by the slope of sensitivity to the target as a function of target contrast in the threshold region. In these studies, high contrast masks have evoked an increase in target threshold. There have been many theories explaining this threshold increase. Some suggest that masking is the result of an intrinsic nonlinearity within a mechanism or of a contrast nonlinearity that operates directly on the output of a single mechanism. Others put the source of masking at a gain control operation which occurs when a surrounding set of mechanisms divide the response of a single mechanism by their summed response. Still others attribute the masking to noise that is multiplicative relative to the neural response signal, or noise that intrudes on the detecting mechanism from neighboring mechanisms. A detailed review of this debate is provided by the paper by Klein et al., 3016-02 in this Proceedings. Threshold elevation functions that show the relationship between mask spatial frequency and masking magnitude cannot illuminate this debate, as we demonstrated at ARVO (1994). For that study, we generated threshold elevation functions (the ratio of unmasked versus masked target threshold) for multi-channel systems using computational models that invoked either divisive inhibition, a set of transducer nonlinearities or multiplicative noise. Threshold elevation functions were indistinguishable when each masking process was assumed to have similar strength. These results led us to design the experiment presented here, which attempts to compare the effects of two of these masking processes, lateral divisive inhibition and nonlinear transducer compression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.