The great success of the three-dimensional (3D) digital cinema industry has opened up a new era of 3D content services.
While we have witnessed a rapid surge of stereoscopic 3D services, the issue of viewing safety remains a possible
obstacle to the widespread deployment of such services. In this paper, we propose a novel disparity remapping method to
reduce the visual discomfort induced by fast change in disparity. The proposed remapping approach selectively adjusts
the disparities of the discomfort regions where the fast change in disparity occurs. To this purpose, the proposed
approach detects visual importance regions in a stereoscopic 3D video, which may have dominant influence on visual
comfort in video frames, and then locally adjust the disparity by taking into account the disparity changes in the visual
importance regions. The experimental results demonstrate that the proposed approach to adjust local problematic regions
can improve visual comfort while preserving naturalness of the scene.
Two experiments were conducted to examine the visual comfort of stereoscopic images. The test video sequences
consisted of moving meteorite-like objects against a blue sky background. In the first experiment, a panel of viewers
rated stereoscopic sequences in which the objects moved back and forth in depth. The velocity of movement, disparity
(depth) range, and disparity type (i.e., depth position with respect to the screen plane: front, behind, or front/behind) of
the objects varied across sequences. In the second experiment, the same viewers rated stereoscopic test sequences in
which the target objects moved horizontally across the screen. Also in this case, the velocity, disparity magnitude, and
disparity type of the objects varied across sequences. For motion in the depth direction, the results indicate that visual
comfort is significantly influenced by the velocity, disparity range, and disparity type of the moving objects. We also
found significant interactions between velocity and disparity type and between disparity type and disparity range. For
motion across the screen in the horizontal plane, ratings of visual comfort depended on velocity and disparity
magnitude. The results also indicate a significant interaction between velocity and disparity. In general, the overall
results confirm that changes in disparity of stereoscopic images over time are a significant contributor to visual
discomfort. Interestingly, the detrimental effect of object velocity on visual comfort are manifested even when the
changes are confined within the generally accepted visual comfort zone of less than 60 arc minutes of horizontal
disparity.
KEYWORDS: Video, 3D vision, Glasses, 3D video compression, Molybdenum, 3D displays, Video processing, Video coding, Computer programming, Image quality
In the stereoscopic frame-compatible format, the separate high-definition left and high-definition right views are reduced
in resolution and packed to fit within the same video frame as a conventional two-dimensional high-definition signal.
This format has been suggested for 3DTV since it does not require additional transmission bandwidth and entails only
small changes to the existing broadcasting infrastructure. In some instances, the frame-compatible format might be used
to deliver both 2D and 3D services, e.g., for over-the-air television services. In those cases, the video quality of the 2D
service is bound to decrease since the 2D signal will have to be generated by up-converting one of the two views. In this
study, we investigated such loss by measuring the perceptual image quality of 1080i and 720p up-converted video as
compared to that of full resolution original 2D video. The video was encoded with either a MPEG-2 or a H.264/AVC
codec at different bit rates and presented for viewing with either no polarized glasses (2D viewing mode) or with
polarized glasses (3D viewing mode). The results confirmed a loss of video quality of the 2D video up-converted
material. The loss due to the sampling processes inherent to the frame-compatible format was rather small for both 1080i
and 720p video formats; the loss became more substantial with encoding, particularly for MPEG-2 encoding. The 3D
viewing mode provided higher quality ratings, possibly because the visibility of the degradations was reduced.
Depth maps are important for generating images with new camera viewpoints from a single source image for
stereoscopic applications. In this study we examined the usefulness of smoothing depth maps for reducing the
cardboard effect that is sometimes observed in stereoscopic images with objects appearing flat like cardboard
pieces. Six stereoscopic image pairs, manifesting different degrees of the cardboard effect, were tested. Depth
maps for each scene were synthesized from the original left-eye images and then smoothed (low-pass filtered).
The smoothed depth maps and the original left-eye images were then used to render new views to create new
"processed" stereoscopic image pairs. Subjects were asked to assess the cardboard effect of the original
stereoscopic images and the processed stereoscopic images on a continuous quality scale, using the doublestimulus
method. In separate sessions, depth quality and visual comfort were also assessed. The results from
16 viewers indicated that the processed stereoscopic image pairs tended to exhibit a reduced cardboard effect,
compared to the original stereoscopic image pairs. Although visual comfort was not compromised with the
smoothing of the depth maps, depth quality was significantly reduced when compared to the original.
In depth image based rendering, video sequences and their associated depth maps are used to render new camera
viewpoints for stereoscopic applications. In this study, we examined the effect of temporal downsampling of the
depth maps on stereoscopic depth quality and visual comfort. The depth maps of four eight-second video sequences
were temporally downsampled by dropping all frames, except the first, for every 2, 4, or 8 consecutive frames. The
dropped frames were then replaced by the retained frame. Test stereoscopic sequences were generated by using the
original image sequences for the left-eye view and the rendered image sequences for the right-eye view. The
downsampled versions were compared to a reference version with full depth maps that were not downsampled.
Based on the data from 21 viewers, ratings of depth quality for the downsampled versions were lower. Importantly,
ratings depended on the content characteristics of the stereoscopic video sequences. Results were similar for visual
comfort, except that the differences in ratings between sequences were larger. The present results suggest that more
processing, such as interpolation of depth maps, might be required to counter the negative effects of temporal
downsampling, especially beyond a downsampling of two.
The ability to convert 2D video material to 3D would be extremely valuable for the 3D-TV industry. Such
conversion might be achieved using depth maps extracted from the original 2D content. We previously
demonstrated that surrogate depth maps with limited or imprecise depth information could be used to produce
effective stereoscopic images. In the current study, we investigated whether gray intensity images associated
with the Cr colour component of standard 2D-colour video sequences could be used effectively as surrogate
depth maps. Colour component-based depth maps were extracted from ten video sequences and used to render
images for the right-eye view. These were then combined with the original images for the left-eye view to form
ten stereoscopic test sequences. A panel of viewers assessed the depth quality and the visual comfort of the
synthesized test sequences and, for comparison, of monoscopic and camera-captured stereoscopic versions of
the same sequences. The data showed that the ratings of depth quality for the synthesized test sequences were
higher than those of the monoscopic versions, but lower than those of the camera-captured stereoscopic
versions. For visual comfort, ratings were lower for the synthesized than for the monoscopic sequences but
either equal to or higher than those of the camera-captured versions
The Video Quality Experts Group (VQEG) is a group of experts from industry, academia, government and standards
organizations working in the field of video quality assessment. Over the last 10 years, VQEG has focused its efforts on
the evaluation of objective video quality metrics for digital video. Objective video metrics are mathematical models that
predict the picture quality as perceived by an average observer. VQEG has completed validation tests for full reference
objective metrics for the Standard Definition Television (SDTV) format. From this testing, two ITU Recommendations
were produced. This standardization effort is of great relevance to the video industries because objective metrics can be
used for quality control of the video at various stages of the delivery chain.
Currently, VQEG is undertaking several projects in parallel. The most mature project is concerned with objective
measurement of multimedia content. This project is probably the largest coordinated set of video quality testing ever
embarked upon. The project will involve the collection of a very large database of subjective quality data. About 40
subjective assessment experiments and more than 160,000 opinion scores will be collected. These will be used to
validate the proposed objective metrics. This paper describes the test plan for the project, its current status, and one of
the multimedia subjective tests.
Previously we demonstrated that surrogate depth maps, consisting of "depth" values mainly at object boundaries in the
image of a scene, are effective for converting 2D images to stereoscopic 3D images using depth image based rendering.
In this study we examined the use of surrogate depth maps whose depth edges were derived from cast shadows located
in multiple images (Multiflash method). This method has the capability to delineate actual depth edges, in contrast to
methods based on (Sobel) edge identification and (Standard Deviation) local luminance distribution. A group of 21 nonexpert
viewers assessed the depth quality and visual comfort of stereoscopic images generated using these three methods
on two sets of source images. Stereoscopic images based on the Multiflash method provided an enhanced depth quality
that is better than the depth provided by a reference monoscopic image. Furthermore, the enhanced depth was
comparable to that observed with the other two methods. However, all three methods generated images that were rated
"mildly uncomfortable" or "uncomfortable" to view. It is concluded that there is no advantage in the use of the
Multiflash method for creating surrogate depth maps. As well, even though the depth quality produced with surrogate
depth maps is sufficiently good, the visual comfort of the stereoscopic images need to be improved before this approach
of using surrogate depth maps can be deemed suitable for general use.
Depth image based rendering (DIBR) is a method for converting 2D material to stereoscopic 3D. With DIBR, information contained in a gray-level (luminance intensity) depth map is used to shift pixels in the 2D image to generate a new image as if it were captured from a new viewpoint. The larger the shift (binocular parallax), the larger is the perceived depth of the generated stereoscopic pair. However, a major problem with DIBR is that the shifted pixels now occupy new positions and leave areas that they originally occupied "empty." These disoccluded regions have to be filled properly, otherwise they can degrade image quality. In this study we investigated different methods for filling these disoccluded regions: (a) Filling regions with a constant color, (b) filling regions with horizontal linear interpolation of values on the hole border, (c) solving the Laplace equation on the hole boundary and propagate the values inside the region, (d) horizontal extrapolation with depth information taken into account, (e) variational inpainting with depth information taken into account, and (f) preprocessing of the depth map to prevent disoccluded regions from appearing. The methods differed in the time required for computing and filling, and the appearance of the filled-in regions. We assessed the subjective image quality outcome for several stereoscopic test images in which the left-eye view was the source and the right-eye view was a rendered view, in line with suggestions in the literature for the asymmetrical coding of stereoscopic images.
A method of producing depth maps for depth-image-based rendering (DIBR) of stereoscopic views is proposed and tested. The method is based on depth-from-defocus techniques, utilizing two original images, one with the camera focused at a near point and the other with it focused at a far point in the captured scene to produce depth maps from blur at edge locations. It is assumed that the level of blur at an edge reflects the distance it is from the focused distance. For each image, estimates of the level of blur edges at local regions are obtained by determining the optimal scale for edge
detection based on a luminance gradient. An Edge-Depth map is then obtained by evaluating differences in blur for corresponding regions in the two images. This is followed by an additional process in which regions in the Edge-Depth map that have no depth values are filled to produce a Filled-Depth map. A group of viewers assessed the depth quality of a representative set of stereoscopic images that were produced by DIBR using the two types of depth maps. It was
found that the stereoscopic images generated with the Filled-Depth and the Edge-Depth maps produced depth quality ratings that were higher than those produced by their monoscopic, two-dimensional counterparts. Images rendered using the Filled-Depth maps, but not the Edge-Depth maps, produced ratings of depth quality that were equal to those produced with factual, full depth maps. A hypothesis as to how the proposed method might be improved is discussed.
It is well known that some viewers experience visual discomfort when looking at stereoscopic displays. One of the factors that can give rise to visual discomfort is the presence of large horizontal disparities. The relationship between excessive horizontal disparity and visual comfort has been well documented for the case in which disparity magnitude does not change across space and time, e.g. for objects in still images. Much less is known about the case in which
disparity magnitude varies over time, e.g., objects moving in depth at some velocity. In this study, we investigated the relationship between binocular disparity, object motion and visual comfort using computer-generated stereoscopic video sequences. Specifically, viewers were asked to rate the visual comfort of stereoscopic sequences that had objects moving periodically back and forth in depth. These sequences varied with respect to the number, size, position in depth, and velocity of movement of the objects in the scene. The results indicate that change in disparity magnitude over time might be more important in determining visual comfort than the absolute magnitude of the disparity per se. The results also suggest that rapid switches between crossed and uncrossed disparities might negatively affect visual comfort.
Depth image based rendering (DIBR) is useful for multiview autostereoscopic systems because it can produce a set of new images with different camera viewpoints, based on a single two-dimensional (2D) image and its corresponding depth map. In this study we investigated the role of object boundaries in depth maps for DIBR. Using a standard subjective assessment method, we asked viewers to evaluate the depth and the image quality of stereoscopic images in which the view for the right eye was rendered using (a) full depth maps, (b) partial depth maps containing full depth information but that was only located at object boundaries and edges, and (c) partial depth maps containing binary depth information at object boundaries and edges. Results indicate that depth quality was enhanced and image quality was slightly reduced for all test conditions, compared to a reference condition consisting of 2D images. The present results confirm previous observations indicating that depth information at object boundaries is sufficient in DIBR to create new views such as to produce a stereoscopic effect. However, depth ratings for the partial depth maps tended to be slightly lower than those generated with the full depth maps. The present study also indicates that more research is needed to increase the depth and image quality of the rendered stereoscopic images based on DIBR before the technique can be of wide and practical use.
Depth image based rendering (DIBR) is suited for 3D-TV and for autostereoscopic multiview displays. With DIBR, each 2D image captured with a camera at a given position has an associated depth map. This map is used to process the original 2D image so as to generate new images as if they were taken from different camera viewpoints. In the present study we examined the depth and image quality of stereoscopic 3D images that were generated using surrogate depth maps, that is, maps that were created using blur and edge information from the original 2D images. Depth maps were created with three different methods. Formal subjective assessments indicated that the stereoscopic images thus created have enhanced depth quality, with a marginal loss in image quality, when compared to the original non-stereoscopic images. This finding of enhanced depth is surprising because the surrogate depth maps contained limited depth information and mainly at object boundaries. We speculate that the visual system combines the information from pictorial depth cues and from depth interpolation between object boundaries and edges to arrive at an overall perception of depth. The methods for creating the depth maps for stereoscopic imaging that were investigated in this study might be used in applications where depth accuracy is not critical.
In this study, we conducted three experiments to investigate the perceived smoothness of multiview images. Different viewpoints of a stereoscopic scene were generated in real-time. The left-eye and right-eye views of each viewpoint were viewed stereoscopically, from a distance of 120 cm, with shutter glasses synchronized to the display. In Experiment 1, new and different vantage points of the scene were displayed as the viewer moved his/her head left and right in front of the display. Viewers rated the perceived smoothness of the scene for different viewpoint densities, i.e., number of viewpoints displayed per unit of amplitude of lateral movement, and extent of look-around, i.e., angular separation between the leftmost and rightmost rendered viewpoints. The second and third experiments were similar with the exception that the change in displayed viewpoint was either controlled by the viewer’s hand (Experiment 2) or occurred without any intervention on the part of the viewer (Experiment 3). Perceived smoothness improved with increasing viewpoint density up to about 4-6 views per cm in all three experiments. Smoothness ratings were somewhat lower in Experiments 1 and 2 than in 3. The perceived smoothness of viewpoint transition was affected by the extent of look-around in Experiments 1 and 2 only.
The perceived quality of video sequences is generally measured using standard subjective methods. It has been argued that these methods, which typically consist of scaling judgment tasks, are affected by context effects. Context effects are observed when the perceived quality of a video sequence is influenced by the perceived quality of the other video sequences included in the test. Several studies have confirmed the presence of context effects. However, the same studies are ambiguous with respect to the issue of which methods are affected the most. In addition, context effects have been investigated mainly with non-expert viewers. In this study, we investigated context effects in both expert and non-expert viewers. Two experiments were conducted to investigate the relationships between context effects, level of expertise, and type of subjective method. In Experiment 1, we measured range and frequency context effects for two different subjective assessment methods, a double stimulus method (i.e., DSCQS) and a comparison scaling method, using non-expert viewers. We found no frequency context effect with both methods, and a marginal range context effect with the DSCQS method. In Experiment 2, we obtained the same measurements with expert viewers. We found no context frequency effect for both subjective methods, and a very small range context effect for the comparison method only.
Current binocular stereoscopic displays cause visual discomfort when the objects with large disparities are present in the scene. With this technique, the improvement of visual comfort has been reported by blurring far background and foregrounds in the scene. However, this technique has a drawback of degrading overall image quality. To lesson visual discomfort caused by large disparities while maintaining high-perceived image quality, we use a novel disparity-based asymmetrical filtering technique. Asymmetrical filtering, which refers to the filtering applied to the image of one eye only, has been showen to maintain the sharpness of a stereoscopic image, provided that the amount of filtering is low. Disparity-based asymmetrical filtering usese the disparity information in a stereoscopic image for controlling the severity of blurring. We investigated the effects of this technique on stereoscopic video by measuring visual comfort and apparent sharpness. Our results indicate that disparity-based asymmetrical filtering does not always improve visual comfort but it maintains image quality.
Multi-dimensional rate control schemes, which jointly adjust two or three coding parameters, have been recently proposed to achieve a target bit rate while maximizing some objective measures of video quality. The objective measures used in these schemes are the peak signal-to-noise ratio (PSNR) or the sum of absolute errors (SAE) of the decoded video. These objective measures of quality may differ substantially from subjective quality, especially when changes of spatial resolution and frame rate are involved. The proposed schemes are, therefore, not optimal in terms of human visual perception. We have investigated the impact on subjective video quality of the three coding parameters: spatial resolution, frame rate, and quantization parameter (QP). To this end, we have conducted two experiments using the H.263+ codec and five video sequences. In Experiment 1, we evaluated the impact of jointly adjusting QP and frame rate on subjective quality and bit rate. In Experiment 2, we evaluated the impact of jointly adjusting QP and spatial resolution. From these experiments, we suggest several general rules and guidelines that can be useful in the design of an optimal multi-dimensional rate control scheme. The experiments also show that PSNR and SAE do not adequately reflect perceived video quality when changes in spatial resolution and frame rate are involved, and are therefore not adequate for assessing quality in a multi-dimensional rate control scheme. This paper describes the method and results of the investigation.
We compared the visual comfort and apparent depth of stereoscopic images for three camera configurations: parallel (without image shift), image-shifted and converged. In the parallel and image-shifted configurations, the stereo cameras were pointed straight ahead. In the converged configuration the cameras were toed-in. In the image-shifted configuration the image frame was shifted perpendicularly with respect to the line of sight of the camera.
The parallel configuration produces images with uncomfortably large disparities for objects near the camera. By converging the cameras or by shifting the image, these large disparities can be reduced and visual comfort can be improved. However, the converged configuration introduces keystone distortions into the image, which can produce visual discomfort. The image-shifted configuration does not introduce keystone distortions, but affects the width of the image frame. It also requires unusual camera hardware or computer post-processing to shift the images.
We found that converged and image-shifted configurations improved the visual comfort of stereoscopic images by an equivalent amount, without affecting the apparent depth. Keystone distortions in the converged configuration had no appreciable negative effect on visual comfort.
We investigated the effect of convergence of stereoscopic cameras on visual comfort and apparent depth. In Experiment 1, viewers rated comfort and depth of three stereoscopic sequences acquired with convergence distance set at 60, 120, 180, 240 cm, or infinity (i.e., parallel). Moderately converged conditions were rated either as comfortable (i.e., 240 cm) or more comfortable (i.e., 120 and 180 cm) than the parallel condition. The 60 cm condition was rated the least comfortable. Camera convergence had no effects on ratings of apparent depth. In Experiment 2, we used computer-generated stereoscopic still images to investigate the effects of convergence in the absence of lens distortions. Results matched those obtained in Experiment 1. In Experiment 3, we artificially introduced keystone distortions in stereoscopic still images. We found that increasing the amount of keystone distortion caused only a minimal decrease in visual comfort and apparent depth.
Stereoscopic images while providing enhanced depth and image quality can cause moderate discomfort. In this paper, we present the results of two experiments aimed at investigating one possible source of discomfort: whole-field vertical disparities. In both experiments, we asked viewers to rate their comfort level while viewing a 3D feature film in which the left and right images were vertically misaligned. The feature film was presented on a large theater type screen. In Experiment 1, the vertical offset was changed randomly on a scene-by-scene basis resulting in an average vertical disparity of 31 minutes or arc at the closest viewing distance. The results showed that whole- field vertical disparities produced a marginal increase in discomfort that became only slightly more pronounced with time. In Experiment 2, we alternated periods of low, medium and high levels of whole-field vertical disparity. At the closest distance, the mean vertical disparity was 15, 30, or 62 minutes of arc for the low, medium and high disparity conditions, respectively. In this experiment, discomfort increased with vertical disparity, but again only marginally even after prolonged exposure. We conclude that whole-field vertical disparities cannot be a major contributor to the discomfort experienced by observers when viewing stereoscopic images.
Asymmetrical coding is a technique that can be used to reduce the bandwidth required for transmission and storage of stereoscopic video images. This technique is based on observations that a high level of perceived stereoscopic image quality can be maintained when the quality of the video stream to one eye is reduced. To address issues surrounding eye dominance and viewing comfort, we proposed to balance the inputs to the two eyes by cross-switching the image quality in the two streams over time. Here, we report two experiments on the visibility of cross-switches, for video sequences and random-dot stereograms. In both experiments, we manipulated a) the degree of asymmetry in quality of the video streams by varying image blur, and b) the timing of the cross-switch (either at a scene-cut or during a continuous scene). The viewers' task was to indicate whether the first of the second of a pair of stereoscopic presentations contained a cross-switch. We found that the cross-switch was masked by a scene cut, and that ease of detection depended on the degree of asymmetrical blur. We conclude that asymmetrical coding combined with cross-switching at scene cuts is a practical bandwidth-reduction technique for stereoscopic video.
In low bit rate coding applications, high quantization levels might be needed to achieve a target bit rate. However, such high levels of quantization are likely to decrease picture quality. A possible solution is to reduce temporal resolution by dropping, for instance, selected frames thereby lessening the requirement for high quantization levels and thus improving video quality. Similarly, the spatial resolution of the encoded video could also be manipulated to achieve the target bit rate. Therefore, it might be possible to maximize picture quality by adjusting dynamically these three parameters while still meeting bit rate constraints. To do so effectively, the relationship between these parameters, alone or in combination, and subjective picture quality must be known. In this paper, we investigated the effect on subjective quality of: quantization alone (Experiment 1); a reduction in spatial resolution either alone or combined to moderate levels of quantization (Experiment 2); and a reduction of temporal resolution either alone or combined with moderate levels of quantization (Experiment 3). The results suggest that at very low bit rates reductions in spatial or temporal resolution combined with moderate levels of quantization might be an effective means of reducing bit rate without further loss in video quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.