Assorted technologies such as; EEG, MEG, fMRI, BEM, MRI, TMS and BCI are being integrated to understand how
human visual cortical areas interact during controlled laboratory and natural viewing conditions. Our focus is on the
problem of separating signals from the spatially close early visual areas. The solution involves taking advantage of
known functional anatomy to guide stimulus selection and employing principles of spatial and temporal response
properties that simplify analysis. The method also unifies MEG and EEG recordings and provides a means for improving
existing boundary element head models. In going beyond carefully controlled stimuli, in natural viewing with scanning
eye movements, assessing brain states with BCI is a most challenging task. Frequent eye movements contribute artifacts
to the recordings. A linear regression method is introduced that is shown to effectively characterize these frequent
artifacts and could be used to remove them. In free viewing, saccadic landings initiate visual processing epochs and
could be used to trigger strictly time based analysis methods. However, temporal instabilities indicate frequency based
analysis would be an important adjunct. The class of Cauchy filter functions is introduced that have narrow time and
frequency properties well matched to the EEG/MEG spectrum for avoiding channel leakage.
The pupil dilation reflex is mediated by inhibition of the parasympathetic Edinger-Westphal oculomotor complex and
sympathetic activity. It has long been documented that emotional and sensory events elicit a pupillary reflex dilation. Is
the pupil response a reliable marker of a visual detection event? In two experiments where viewers were asked to report
the presence of a visual target during rapid serial visual presentation (RSVP), pupil dilation was significantly associated
with target detection. The amplitude of the dilation depended on the frequency of targets and the time of the detection.
Larger dilations were associated with trials having fewer targets and with targets viewed earlier during the trial. We also
found that dilation was strongly influenced by the visual task.
KEYWORDS: Visualization, Electroencephalography, Magnetoencephalography, Functional magnetic resonance imaging, Electrodes, Visual process modeling, Process control, Magnetic resonance imaging, Signal to noise ratio, Spatial frequencies
The human brain has well over 30 cortical areas devoted to visual processing. Classical neuro-anatomical as well as
fMRI studies have demonstrated that early visual areas have a retinotopic organization whereby adjacent locations in
visual space are represented in adjacent areas of cortex within a visual area. At the 2006 Electronic Imaging meeting we
presented a method using sprite graphics to obtain high resolution retinotopic visual evoked potential responses using
multi-focal m-sequence technology (mfVEP). We have used this method to record mfVEPs from up to 192 non
overlapping checkerboard stimulus patches scaled such that each patch activates about 12 mm2 of cortex in area V1 and
even less in V2. This dense coverage enables us to incorporate cortical folding constraints, given by anatomical MRI
and fMRI results from the same subject, to isolate the V1 and V2 temporal responses. Moreover, the method offers a
simple means of validating the accuracy of the extracted V1 and V2 time functions by comparing the results between
left and right hemispheres that have unique folding patterns and are processed independently. Previous VEP studies
have been contradictory as to which area responds first to visual stimuli. This new method accurately separates the
signals from the two areas and demonstrates that both respond with essentially the same latency. A new method is
introduced which describes better ways to isolate cortical areas using an empirically determined forward model. The
method includes a novel steady state mfVEP and complex SVD techniques. In addition, this evolving technology is put
to use examining how stimulus attributes differentially impact the response in different cortical areas, in particular how
fast nonlinear contrast processing occurs. This question is examined using both state triggered kernel estimation (STKE)
and m-sequence "conditioned kernels". The analysis indicates different contrast gain control processes in areas V1 and
V2. Finally we show that our m-sequence multi-focal stimuli have advantages for integrating EEG and MEG for
improved dipole localization.
The typical multifocal stimulus used in visual evoked potential (VEP) studies consists of about 60 checkerboard stimulus patches each independently contrast reversed according to an m-sequence. Cross correlation of the response (EEG, MEG, ERG, or fMRI) with the m-sequence results in a series of response kernels for each response channel and each stimulus patch. In the past the number and complexity of stimulus patches has been constrained by graphics hardware, namely the use of look-up-table (LUT) animation methods. To avoid such limitations we replaced the LUTs with true color graphic sprites to present arbitrary spatial patterns. To demonstrate the utility of the method we have recorded simultaneously from 192 cortically scaled stimulus patches each of which activate about 12mm2 of cortex in area V1. Because of the sparseness of cortical folding, very small stimulus patches and robust estimation of dipole source orientation, the method opens a new window on precise spatio-temporal mapping of early visual areas. The use of sprites also enables multiplexing stimuli such that at each patch location multiple stimuli can be presented. We have presented patterns with different orientations (or spatial frequencies) at the same patch locations but independently temporally modulated, effectively doubling the number of stimulus patches, to explore cell population interactions at the same cortical locus. We have also measured nonlinear responses to adjacent pairs of patches, thereby getting an edge response that doubles the spatial sampling density to about 1.8 mm on cortex.
Most corneal topographers are slope-based instruments, measuring corneal slope based on light reflected by the cornea acting as a mirror. This mirror method limits corneal coverage to about 9 mm diameter. Both refractive surgery and contact lens fitting actually require a larger coverage than is obtainable using slope-based instruments. Height-based instruments should be able to measure a cornea/sclera area that is twice the size (four times the area) of slope-based topographers with an accuracy of a few microns. We have been testing a prototype of a new model height-based topographer manufactured by Euclid Systems. We find that single shots can produce a corneal coverage of up to 16 mm vertical and 20 mm horizontal. The heights and slopes in the corneal region have good replicability. Although the scleral region is noisier, it is the only topographer available able to measure scleral topography that is critically important to contact lens fitting. There are a number of improvements to the Euclid software and hardware that would enable it to fill an important niche in eye care and eye research.
Models that predict human performance on narrow classes of visual stimuli abound in the vision science literature. However, the vision and the applied imaging communities need robust general-purpose, rather than narrow, computational human visual system (HVS) models to evaluate image fidelity and quality and ultimately improve imaging algorithms. Of the general-purpose early HVS models that currently exist, direct model comparisons on the same data sets are rarely made. The Modelfest group was formed several years ago to solve these and other vision modeling issues. The group has developed a database of static spatial test images with threshold data that is posted on the WEB for modellers to use in HVS model design and testing. The first phase of data collection was limited to detection thresholds for static gray scale 2D images. The current effort will extend the database to include thresholds for selected grayscale 2D spatio-temporal image sequences. In future years, the database will be extended to include discrimination (masking) for dynamic, color and gray scale image sequences. The purpose of this presentation is to invite the Electronic Imaging community to participate in this effort and to inform them of the developing data set, which is available to all interested researchers. This paper presents the display specifications, psychophysical methods and stimulus definitions for the second phase of the project, spatio-temporal detection. The threshold data will be collected by each of the authors over the next year and presented on the WEB along with the stimuli.
Refractive surgery is evolving rapidly. A recent development uses wavefront aberration information to improve the surgical outcome. Before the wavefront information can be used effectively a number of problems must first be resolved. One of the main concerns is the presence of halos during night driving conditions. It seems clear that this problem occurs when the pupil is large enough to overlap the ablation transition zone. Several questions associated with the transition zone are examined. Shape descriptors to characterize the transition zone are discussed. Better ways of quantitatively characterizing the transition zone and predicting its properties are needed to help specify the ablation. Many of the issues associated with improving refractive surgery can be addressed by establishing a standards committee that includes basic researchers and clinicians. This committee can become a forum for developing techniques to assess visual outcomes, it can make recommendations for developing a database that would allow researchers to compare the intended outcome of ablation with the actual outcome. In order for this enterprise to be successful increased openness about surgery parameters and surgery outcome would be helpful.
We quantitatively evaluated a technique for combining multiple videokeratograph views of different areas of cornea. To achieve this we first simulated target reflection from analytic descriptions of various shapes believed to mimic common corneal topographies. The splicing algorithm used the simulated reflections to achieve a good quality estimation of the shapes. Actual imagery was then acquired of manufactured models of the same shapes and the splicing algorithm was found to achieve a less perfect estimation. The cause was thought mainly to be image blur due to defocus. To investigate this, blur was introduced into the reflection simulation, and the results of the splicing algorithm compared to those found from the actual imagery.
KEYWORDS: Data modeling, Visual process modeling, Spatial frequencies, Human vision and color perception, Databases, Video compression, Composites, Visualization, Performance modeling, Image compression
A robust model of the human visual system (HVS) would have a major practical impact on the difficult technological problems of transmitting and storing digital images. Although most HVS models exhibit similarities, they may have significant differences in predicting performance. Different HVS models are rarely compared using the same set of psychophysical measurements, so their relative efficacy is unclear. The Modelfest organization was formed to solve this problem and accelerate the development of robust new models of human vision. Members of Modelfest have gathered psychophysical threshold data on the year one stimuli described at last year's SPIE meeting. Modelfest is an exciting new approach to modeling involving the sharing of resources, learning from each other's modeling successes and providing a method to cross-validate proposed HVS models. The purpose of this presentation is to invite the Electronic Imaging community to participate in this effort and inform them of the developing database, which is available to all researchers interested in modeling human vision. In future years, the database will be extended to other domains such as visual masking, and temporal processing. This Modelfest progress report summarizes the stimulus definitions and data collection methods used, but focuses on the results of the phase one data collection effort. Each of the authors has provided at least one dataset from their respective laboratories. These data and data collected subsequent to the submission of this paper are posted on the WWW for further analysis and future modeling efforts.
Corneal topographers have made it possible to accurately map corneal shape. We applied this technology to model the post- refractive surgery cornea using Taylor series polynomials. Topography data was taken from 58 patient eyes with photorefractive keratectomy (PRK) or astigmatic photorefractive keratectomy (PARK). We looked at the changes the cornea underwent surgically, as well as the healing process. We compared the post-ablation cornea to the pre-ablation cornea and to the intended correction using novel topography maps. From the refractive map, we quantified the spherical aberration as areas of defocus on the cornea.
From the pre-op exam to the first post-op exam, we measured 0.19±0.10 mm radius decrease in PRK and a 0.13±0.08 mm radius decrease in PARK in the areas where rays come to within two diopeters of defocus. As this change occurs within the optical zone, this corresponds to an increase in spherical aberration for both the PRK and the PARK patient. As the patient healed, we found additional decrease in radius of the zones of best vision in PRK patients, whereas we found no significant decrease in PARK patients. This corresponds to increased spherical aberration in the PRK patient.
Many current corneal topography instruments (called videokeratographs) provide an `acuity index' based on corneal smoothness to analyze expected visual acuity. However, post-refractive surgery patients often exhibit better acuity than is predicted by such indices. One reason for this is that visual acuity may not necessarily be determined by overall corneal smoothness but rather by having some part of the cornea able to focus light coherently onto the fovea.
We present a new method of representing visual acuity by measuring the wavefront aberration, using principles from both ray and wave optics. For each point P on the cornea, we measure the size of the associated coherence area whose optical path length (OPL), from a reference plane to P's focus, is within a certain tolerance of the OPL for P.
We measured the topographies and vision of 62 eyes of patients who had undergone the corneal refractive surgery procedures of photorefractive keratectomy (PRK) and photorefractive astigmatic keratectomy (PARK). In addition to high contrast visual acuity, our vision tests included low contrast and low luminance to test the contribution of the PRK transition zone. We found our metric for visual acuity to be better than all other metrics at predicting the acuity of low contrast and low luminance. However, high contrast visual acuity was poorly predicted by all of the indices we studied, including our own.
The indices provided by current videokeratographs sometimes fail for corneas whose shape differs from simple ellipsoidal models. This is the case with post-PRK and post-PARK refractive surgery patients. Our alternative representation that displays the coherence area of the wavefront has considerable advantages, and promises to be a better predictor of low contrast and low luminance visual acuity than current shape measures.
KEYWORDS: Visual process modeling, Data modeling, Spatial frequencies, Databases, Visualization, Image quality, Image compression, Human vision and color perception, Performance modeling, Linear filtering
Models that predict human performance on narrow classes of visual stimuli abound in the vision science literature. However, the vision and the applied imaging communities need robust general-purpose, rather than narrow, computational human visual system models to evaluate image fidelity and quality and ultimately improve imaging algorithms. Psychophysical measure of image imaging algorithms. Psychophysical measures of image quality are too costly and time consuming to gather to evaluate the impact each algorithm modification might have on image quality.
Packet transmissions over the Internet incur delay jitter that requires data buffering for resynchronization, which is unfavorable for interactive applications. Last year we reported result of formal subjective quality evaluation experiments on delay cognizant video coding (DCVC), which introduces temporal jitter into the video stream. Measures such as MSE and MPQM indicate the introduction of jitter should degrade video quality. However, most observers actually preferred compressed video sequences with delay to sequences without. One reason for this puzzling observation is that the delay introduced by DCVC suppresses the dynamic noise artifacts introduced by compression, thereby improving quality. This observation demonstrates the possibility of reducing bit rate and improving perceived quality at the same time. We have been characterizing conditions in which dynamic quantization noise suppression might improve video quality. A new battery of video test sequences using simple stimuli were developed to avoid the complexity of natural scenes. These sequences are cases where quantization noise produces bothersome temporal flickering artifacts. We found the significance of artifacts depend strongly on the local image content. Pseudo code is provided for generating these test stimuli in the hope that they lead to the development of future video compression algorithms which take advantage of this technique of improving quality by dampening temporal artifacts.
KEYWORDS: Video, Video compression, Video coding, Computer programming, Image segmentation, Image quality, Visualization, Quantization, Human vision and color perception, Computational modeling
The conventional synchronous model of digital video, in which video is reconstructed synchronously at the decoder on a frame-by-frame basis, assumes its transport is delay- jitter-free. This assumption is inappropriate for modern integrated service packet networks such as the Internet for network delay jitter varies widely. Furthermore, multiframe buffering is not a viable solution in interactive applications such as video conferencing. We have proposed a `delay cognizant' model of video coding (DCVC) that segments an incoming video into two video flows with different delay attributes. The DCVC decoder operates in an asynchronous reconstruction mode that attempts to maintain image quality in the presence of network delay jitter. Our goal is to maximize the allowable delay of one flow relative to that of the other with minimal effect on image quality since an increase in the delay offset reflects more tolerance to transmission delay jitter. Subjective quality evaluations indicates for highly compressed sequences, differences in video quality of reconstructed sequences with large delay offsets as compared with zero delay offset are small. Moreover, in some cases asynchronously reconstructed video sequences look better than the zero delay case. DCVC is a promising solution to transport delay jitter in low- bandwidth video conferencing with minimal impact on video quality.
CWhatUC (pronounced 'see what you see') is a computer software system which will predict a patient's visual acuity using several techniques based on fundamentals of geometric optics. The scientific visualizations we propose can be clustered into two classes: retinal representations and corneal representations; however, in this paper, we focus our discussion on corneal representations. It is important to note that, for each method listed below, we can illustrate the visual acuity with or without spectacle correction. Corneal representations are meant to reveal how well the cornea focuses parallel light onto the fovea of the eye by providing a pseudo-colored display of various error metrics. These error metrics could be: (1) standard curvature representations, such as instantaneous or axial curvature, converted to refractive power maps by taking Snell's law into account; (2) the focusing distance from each refracted ray's average focus to the computed fovea; (3) the retinal distance on the retinal plane from each refracted ray to the chief ray (lateral spherical aberration). For each error metric, we show both real and simulated data, and illustrate how each representation contributes to the simulation of visual acuity.
Seven types of masking are discussed: multi-component contrast gain control, one-component transducer saturation, two- component phase inhibition, multiplicative noise, high spatial frequency phase locked interference, stimulus uncertainty, and noise intrusion. In the present vision research community, multi-component contrast gain is gaining in popularity while the one- and two-component masking models are losing adherents. In this paper we take the presently unpopular stance and argue against multi-component gain control models. We have a two-pronged approach. First, we discuss examples where high contrast maskers that overlap the test stimulus in both position and spatial frequency nevertheless produce little masking. Second, we show that alternatives to gain control are still viable, as long as uncertainty and noise intrusion effects are included. Finally, a classification is offered for different types of uncertainty effects that can produce large masking behavior.
KEYWORDS: Transducers, Interference (communication), Spatial frequencies, Systems modeling, Visualization, Psychophysics, Signal to noise ratio, Visual system, Target detection, Signal detection
The properties of spatial vision mechanisms are often explored psychophysically with simultaneous masking paradigms. A variety of hypotheses have been proposed to explain how the mask pattern utilized in these paradigms increases threshold. Numerous studies have investigated the properties of a particular origin of masking hypothesis but few have attempted to compare the properties of masking at several points in the process. Our study isolates masking due to lateral divisive inhibition at a point where mechanism responses are combined, and compares it with masking of the same target due to a nonlinearity either intrinsic to a mechanism or directly operating on the response of a single mechanism. We also measure the slopes of psychometric functions to examine the relationship between uncertainty and mask contrast. Studies of simultaneous masking utilizing a pedestal mask (an identical test and mask pattern) have measured facilitation for low contrast masks. This decrease in threshold from the solo target threshold is commonly referred to as the 'dipper' effect and has been explained as an increase in signal-to- noise ratio from the high unmasked level occurring as the visual system becomes more certain of target location. The level of uncertainty is indicated by the slope of sensitivity to the target as a function of target contrast in the threshold region. In these studies, high contrast masks have evoked an increase in target threshold. There have been many theories explaining this threshold increase. Some suggest that masking is the result of an intrinsic nonlinearity within a mechanism or of a contrast nonlinearity that operates directly on the output of a single mechanism. Others put the source of masking at a gain control operation which occurs when a surrounding set of mechanisms divide the response of a single mechanism by their summed response. Still others attribute the masking to noise that is multiplicative relative to the neural response signal, or noise that intrudes on the detecting mechanism from neighboring mechanisms. A detailed review of this debate is provided by the paper by Klein et al., 3016-02 in this Proceedings. Threshold elevation functions that show the relationship between mask spatial frequency and masking magnitude cannot illuminate this debate, as we demonstrated at ARVO (1994). For that study, we generated threshold elevation functions (the ratio of unmasked versus masked target threshold) for multi-channel systems using computational models that invoked either divisive inhibition, a set of transducer nonlinearities or multiplicative noise. Threshold elevation functions were indistinguishable when each masking process was assumed to have similar strength. These results led us to design the experiment presented here, which attempts to compare the effects of two of these masking processes, lateral divisive inhibition and nonlinear transducer compression.
The luminance of a given display pixel depends not only on the present input voltage but also on the input voltages for the preceding pixel or pixels along the display raster. This effect which we refer to as the adjacent pixel nonlinearity is never compensated for when 2D stimulus patterns are presented on standard display monitors. To compensate for the adjacent pixel nonlinearity, we summarize in this paper the methods for generating a 2D lookup table which corrects for the nonlinearity over most of the displays luminance range. This table works even if the current pixel luminance depends on more than one preceding pixel. The creation of a 2D lookup involves making a series of calibration measurements and a least squares data fitting procedure to determine the parameters for a model of the adjacent pixel nonlinearity proposed by Mulligan and Stone. Once the parameters are determined for a particular display the 2D lookup table is created. To increase the available mean luminance we have evaluated the utility for 2D lookup table use when multiple color guns are in use.
KEYWORDS: Image compression, Visual process modeling, Visualization, Video, Video compression, Process control, Human vision and color perception, Visual system, Data modeling, Visual compression
One area of applied research in which vision scientists can have a significant impact is in improving image compression technologies by developing a model of human vision which can be used as an image fidelity metric. Scene cuts and other transient events in a video sequence have significant impact on digital video transmission bandwidth. We have therefore been studying masking at transient edge boundaries where bit rate savings might be achieved. Using Crawford temporal and Westheimer spatial masking techniques, we find unexpected stimulus polarity dependent effects. At normal video luminance levels there is a greater than fourfold increase in narrow line detection thresholds near the temporal onset of luminance pedestals. The largest elevations occur for pedestal widths in the range of 2 - 10 min. When the luminance polarity of the test line matches that of the pedestal polarity the masking is much greater than when the test and pedestal have opposite polarities. We believe at least two masking processes are involved; (1) a rapid response saturation in on- or off-center visual mechanisms and (2) a process based on a stimulus ambiguity when the test and pedestal are about the same size. The fact that masking is greatest for local spatial configurations gives one hope for its practical implementation in compression algorithms.
Standard 1D gamma-correcting lookup tables do not compensate for adjacent pixel spatial nonlinearities along the direction of the display raster. These nonlinearities can alter the local mean luminance and contrast of the displayed image. Five steps are described for generating a 2D lookup table (LUT) that compensates for the nonlinearity. By adjusting the 2D LUT so it takes into account the inherent blur at light to dark transitions of the display system, the usable luminance range of the LUT can be extended while reducing the ringing artifact associated with luminance compensation. Use of the blur-compensated 2D LUT incurs no additional computational effort over an uncompensated 2D LUT. Matlab programs are included that can be used to generate a 2D LUT for a user's particular display system.
Post-processing can alleviate or remove artifacts introduced by compression. However without a priori information, image enhancement schemes may fail. What is noise in one image may be important data in another. Fortunately, in image compression, we have an advantage. Before an image is stored or transmitted, we have access to the original and the distorted versions. The enhanced codec is compared to the original block by block to determine which blocks have been improved by the enhancement. These blocks are then flagged for post- processing in a way that is compliant with the JPEG standard and adds nothing to the compressed images` bandwidth. A single JPEG coefficient is adjusted so that the sum of the coefficients contains the flag for post-processing as the parity of the block. Half of the blocks already have the correct parity. In the other blocks, a coefficient that is close to being half way between two values will be chosen and rounded in the other direction. This distorts the image by a very tiny amount. The end result is a compressed image that can be decompressed on any standard JPEG decompressor, but that can be enhanced by a sophisticated decompressor.
In raster-scan CRT display systems, the luminous flux of a given pixel is affected by the preceding pixel along the raster direction. This spatial or adjacent pixel nonlinearity can adversely affect image quality. High contrast, high spatial frequency regions of an image will have wrong luminances. A simple lookup table (standard gamma correction) can not correct this nonlinearity. We measured the spatial nonlinearity under a variety of luminance conditions in two CRT displays. A model proposed by Mulligan and Stone was used in a 5 parameter nonlinear regression to fit the data. Results show that the model fit our data very well. We employed a 2-D lookup table to compensate for the spatial nonlinearity. The new lookup table has two entries: the intended luminance of the current pixel and the actual voltage of the previous pixel. The output of the new lookup table is the adjusted voltage which compensates for the pixel interaction and gives the correct average luminance for that pixel. Psychophysical experiments show that at small pixel sizes (less than 0.8 min), the compensation results in a sharp accurate image.
This paper has three parts. Part 1 contains musings on the title of this conference, 'Computational Vision Based on Neurobiology.' Progress has been slow in computational vision because very difficult problems are being tackled before the simpler problems have been solved. Part 2 is about one of these simpler problems in computational vision that is largely neglected by computational vision researchers: the development of a fidelity metric. This is an enterprise perfectly suited for computational vision with the side benefit of having spectacular practical implications. Part 3 discusses the research my colleagues and I have been pursuing for the past several years on the Test-Pedestal approach to spatial vision. This approach can be helpful as a guide for the development of a fidelity metric. A number of experiments using this approach are discussed. These examples demonstrate both the power and the pitfalls of the Test-Pedestal approach.
The discrete cosine transform (DCT) can be used to transform two images into a space where it is easy to obtain an estimate of their perceptual distance. We used this method to find the closest fit of the ASCII symbols (which includes the English alphabet, numbers, punctuation, and common symbols) to rectangular segments of a gray-scale image. Each segment was converted into a DCT coefficient matrix which was compared to the coefficient matrix of each ASCII symbol. The image segment was replaced with the symbol that had the least weighted Euclidean distance. Thus, a page of text was generated that resembled the original image. The text image format has the advantage that it can be displayed on a non-graphic terminal or printer. It can also be sent via electronic mail without requiring further processing by the receiver. The processing scheme can also be used to preview stored images when transmission bandwidth is limited or a graphic output device is unavailable.
This paper asks how the vision community can contribute to the goal of achieving perceptually lossless image fidelity with maximum compression. In order to maintain a sharp focus the discussion is restricted to the JPEG-DCT image compression standard. The numerous problems that confront vision researchers entering the field of image compression are discussed. Special attention is paid to the connection between the contrast sensitivity function and the JPEG quantization matrix.
Several topics connecting basic vision research to image compression and image quality are discussed: (1) A battery of about 7 specially chosen simple stimuli should be used to tease apart the multiplicity of factors affecting image quality. (2) A 'perfect' static display must be capable of presenting about 135 bits/min2. This value is based on the need for 3 pixels/min and 15 bits/pixel. (3) Image compression allows the reduction from 135 to about 20 bits/min2 for perfect image quality. 20 bit/min2 is the information capacity of human vision. (4) A presumed weakness of the JPEG standard is that it does not allow for Weber's Law nonuniform quantization. We argue that this is an advantage rather than a weakness. (5) It is suggested that all compression studies should report two numbers separately: the amount of compression achieved from quantization and the amount from redundancy coding. (6) The DCT, wavelet and viewprint representations are compared. (7) Problems with extending perceptual losslessness to moving stimuli are discussed. Our approach of working with a 'perfect' image on a 'perfect' display with 'perfect' compression is not directly relevant to the present situation with severely limited channel capacity. Rather than studying perceptually lossless compression we must carry out research to determine what types of lossy transformations are least disturbing to the human observer. Transmission of 'perfect', lossless images will not be practical for many years.
The information gathering capacity of the visual system can be specified in units of bits/mm2. The fall-off in
sensitivity of the human visual system at high spatial frequencies allows a reduction in the bits/mm2 needed to specify an
image. A variety of compression schemes attempt to achieve a further reduction in the number of bit/mm2 while
maintaining perceptual losslessness. This paper makes the point that whenever one reports the results of an image
compression study, numbers should be provided. The first is the number of bits/mm2 that can be achieved using
properties of the human visual system, but ignoring the redundancy of the image (entropy coding). The second number is
the bits/mm2 including the effects of entropy coding. The first number depends mainly on the properties of the visual
system, the second number includes, in addition, the properties of the image. The Discrete Cosine Transform (DCT)
compression method is used to determine the first number. It is shown that the DCT requires between 16 and 24
bits/mm2 for perceptually lossless encoding of images, depending on the size of the blocks into which the image is
subdivided. In addition, the efficiency of DCT compression is found to be limited by its susceptibility to interference from
adjacent maskers. The present analysis suggests that the visual system requires many more bits/mm2 than the results of
other researchers who find that .5 bits/mm2 are sufficient to represent an image without perceptible loss.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.