Hand tracking algorithms relying on a single camera as the sensing device can only provide relative depth information, resulting in limited practicality. This limitation underscores the necessity for effective and accurate estimation of the absolute distances between hand joints and the camera in the real world. We respond to this pressing need by introducing a methodology that exploits the autofocus functionality of a camera for hand tracking. It takes advantage of the unutilized potential of a camera and removes the need for additional power-demanding and costly depth sensors to accurately estimate the absolute distances of hand joints. Our methodology undergoes rigorous experimental validation and consistently outperforms traditional methods across different lens positions.
Most near-eye displays with one fixed focal plane suffer from the vergence-accommodation conflict (VAC) and cause visual discomfort to users. In contrast, a light field display with continuous focal planes offers the most natural and comfortable AR/VR visual experiences without VAC and holds the promise to be the ultimate near-eye 3-D display. It projects light rays onto human retina as if the light rays were emanated from a real object. This paper considers a near-eye light field display comprising a light field generator, a collimator, and a geometric waveguide as the three main components. It takes 4-D light field data in the form of an array of 2-D subview images as input and generates a light field as output. The light field generator is the device responsible for converting the light emitted from the display panel to the light representing the light field of a virtual scene. The geometric waveguide along with a collimator ensures that the light rays propagating in the waveguide are collimated. The partially reflective mirrors of the waveguide replicate the optical path to achieve exit pupil expansion (EPE) and a large eyebox. However, existing waveguide eyepieces for near-eye AR/VR displays are not designed for, and hence may not fit light field displays. In this work, we look into a geometric waveguide for light field display and find that the light fields replicated by the partially reflective mirrors cannot perfectly overlap on the user’s retina, resulting in the appearance of multiple repetitive images—a phenomenon we call “ghost artifact”. This paper delves into the cause of this artifact and develops a solution for applications that require short-range interaction with virtual objects, such as surgical procedures. We define a working range devoid of noticeable ghost artifact based on the angular resolution characteristics of human eye and optimize the orientation of an array of partially reflective mirrors of the waveguide to meet the image quality requirement for short-range interaction. With the optimized waveguide, the ghost artifact is significantly reduced. More results of the optimized waveguide will be shown at the conference.
Natural and comfortable visual experience is critical to the success of metaverse, which has recently drawn worldwide attention. However, due to the vergence-accommodation conflict (VAC), most augmented reality (AR) or virtual reality (VR) displays on the market today can easily cause visual discomfort or eyestrain to users. Being able to resolve the VAC, light field display is commonly believed to be the ultimate display for metaverse. Similar to conventional near-eye AR displays, a near-eye light field AR display consists of three basic components: light source, projection unit, and eyepiece. Although the same light source can be used in both kinds of displays, the projection unit and the eyepiece of a near-eye light field AR display call for a new design to preserve the structure of light field when it reaches user’s retina. The primary focus of this paper is on the eyepiece design for a near-eye light field AR display. In consideration of the compact form factor and wide field of view, the birdbath architecture, which consists of a beam splitter and a combiner, is selected as the basis of our eyepiece design. We optimize the birdbath eyepiece for the light field projection module produced by PetaRay Inc. The birdbath eyepiece receives the light field emitted from the light field projection module and projects it fully into the user’s eye. Our design preserves the structure of light field and hence allows virtual objects at different depths to be properly perceived. In addition, the eyepiece design leads to a compact form factor for the near-eye light field AR display. Specifically, our eyepiece is designed by optimizing the tradeoff between the eyebox and the depth of focus (DOF) of the near-eye light field AR display. The resulting DOF allows the user to have a clear and sharp perception of any virtual object in the working range, which is from 30 cm to infinity. In addition, we optimize the entrance pupil position and the F-number of the eyepiece according to the exit pupil position and the divergence angle of the light field projection module. This way, the eyepiece is able to preserve the structure of light field, meaning that the angular relation between light rays coming from the same object point in space is preserved. To demonstrate the performance of our birdbath eyepiece, we use the Human Eye Model-Liou & Brennan (JOSA A 08/97) to simulate the image formation process.
Recent mobile imaging seeks to expedite the autofocus process by embedding a phase detector in the image sensor to provide information for controlling both the magnitude and direction of lens movement. Compared to conventional contrast-detection autofocus, phase-detection autofocus (PDAF) is able to quickly bring the lens toward the in-focus position. However, the presence of sensor noise, the lack of image contrast, and the spatial offset between the left and right phase detectors can easily affect the performance of phase detection. We present a statistical approach to address this issue by characterizing the distribution of phase shift for a given distance of the lens to the in-focus position. We model the phase shift as a skew-normal distribution and verify it empirically. The results show that the skew-normal distribution is indeed a proper model for the phase shift data. We also propose a method based on Bayes’ theorem to determine the lens movement. Experimental results show that the proposed method is able to improve the reliability of PDAF.
In the presence of light bloom or glow, multiple peaks may appear in the focus profile and mislead the autofocus system
of a digital camera to an incorrect in-focus decision. We present a novel method to overcome the blooming effect. The
key idea behind the method is based on the observation that multiple peaks are generated due to the presence of false
features in the captured image, which, in turn, are due to the presence of fringe (or feather) of light extending from the
border of the bright image area. By detecting the fringe area and excluding it from focus measurement, the blooming
effect can be reduced. Experimental results show that the proposed anti-blooming method can indeed improve the
performance of an autofocus system.
This paper concerns the compensation of specular highlight for handheld image projectors. By employing a projector-camera configuration, where the camera is aligned with the viewer, the distortion caused by nonideal (e.g., colored, reflective) projection surfaces can be estimated from the captured image and compensated for accordingly to improve the projection quality. This works fine when the viewing direction relative to the system is fixed. However, the compensation becomes inaccurate when this condition changes, because the position of the specular highlight changes as well. We propose a novel method that, without moving the camera, can estimate the specular highlight seen from any position and integrate it with Grossberg’s radiometric compensation framework to demonstrate how view-dependent compensation can be achieved. Extensive results, both objective and subjective, are provided to demonstrate the performance of the proposed algorithm.
We consider the quality assessment of images displayed on a liquid crystal display (LCD) with dim backlight-a
situation where the power consumption of the LCD is set to a low level. This energy saving mode of LCD decreases the
perceived image quality. In particular, some image regions may appear so dark that they become non-perceptible to
human eye. The problem becomes more severe when the image is illuminated with very dim backlight. Ignoring the
effect of dim backlight on image quality assessment and directly applying an image quality assessment metric to the
entire image may produce results inconsistent with human evaluation. We propose a method to fix the problem. The
proposed method works as a precursor of image quality assessment. Specifically, given an image and the backlight
intensity level of the LCD on which the image is to be displayed, the method automatically classifies the pixels of an
image into perceptible and non-perceptible pixels according to the backlight intensity level and excludes the nonperceptible
pixels from quality assessment. Experimental results are shown to demonstrate the performance of the proposed method.
Switching the backlight of handheld devices to low power mode saves energy but affects the color appearance of an
image. In this paper, we consider the chroma degradation problem and propose an enhancement algorithm that
incorporates the CIECAM02 appearance model to quantitatively characterize the problem. In the proposed algorithm, we
enhance the color appearance of the image in low power mode by weighted linear superposition of the chroma of the
image and that of the estimated dim-backlight image. Subjective tests are carried out to determine the perceptually
optimal weighting and prove the effectiveness of our framework.
The saliency map is useful for many applications such as image compression, display, and visualization. However, the
bottom-up model used in most saliency map construction methods is computationally expensive. The purpose of this
paper is to improve the efficiency of the model for automatic construction of the saliency map of an image while
preserving its accuracy. In particular, we remove the contrast sensitivity function and the visual masking component of
the bottom-up visual attention model and retain the components related to perceptual decomposition and center-surround
interaction that are critical properties of human visual system. The simplified model is verified by performance
comparison with the ground truth. In addition, a salient region enhancement technique is adopted to enhance the
connectivity of the saliency map, and the saliency maps of three color channels are fused to enhance the prediction
accuracy. Experimental results show that the average correlation between our algorithm and the ground truth is close to
that between the original model and the ground truth, while the computational complexity is reduced by 98%.
Automatic white balancing is an important function for digital cameras. It adjusts the color of an image and makes the
image look as if it is taken under canonical light. White balance is usually achieved by estimating the chromaticity of the
illuminant and then using the resulting estimate to compensate the image. The grey world method is the base of most
automatic white balance algorithms. It generally works well but fails when the image contains a large object or
background with a uniform color. The algorithm proposed in this paper solves the problem by considering only pixels
along edges and by imposing an illuminant constraint that confines the possible colors of the light source to a small
range during the estimation of the illuminant. By considering only edge points, we reduce the impact of the dominant
color on the illuminant estimation and obtain a better estimate. By imposing the illuminant constraint, we further
minimize the estimation error. The effectiveness of the proposed algorithm is tested thoroughly. Both objective and
subjective evaluations show that the algorithm is superior to other methods.
As opposed to the global shutter, which starts and stops the light integration of each pixel at the same time by
incorporating a sample-and-hold switch with analog storage in each pixel, the electronic rolling shutter found in most
low-end CMOS image sensors today collects the image data row by row, analogous to an open slit that scans over the
image sequentially. Each row integrates the light when the slit passes over it. Therefore, the scanlines of the image
are not exposed at the same time. This sensor architecture creates an objectionable geometric distortion, known as the
rolling shutter effect, for moving objects. In this paper, we address this problem by using digital image processing
techniques. A mathematical model of the rolling shutter is developed. The relative image motion between the moving
objects and the camera is determined by block-based motion estimation. A Bezier curve fitting is applied to smooth the
resulting motion data , which are then used for the alignment of scanlines. The basic ideas behind the algorithm
presented here can be generalized to deal with other complicated cases.
Multimedia applications running over wireless or other error prone transmission media require compression algorithms that are resilient to channel degradation. This paper presented a data packetization approach to make the emerging ISO JPEG-2000 image compression standard resilient to transmission errors. The proposed technique can be easily extended to other wavelet based-image codec schemes. Extensive simulation results shown that, with the proposed approach, a decoder is able to recover up to 8.5 dB in PSNR with a minimum overhead, and without affecting coding efficiency and spatial/quality scalability. Finally, the proposed approach supports unequal error protection of the wavelet subbands.
We derive a visual image quality metric from a model of human visual processing that takes as its input an original image and a compressed or otherwise altered version of that image. The model has multiple channels tuned to spatial frequency, orientation and color. Channel sensitivities are scaled to match a bandpass achromatic spatial frequency contrast sensitivity function (CSF) and lowpass chromatic CSFs. The model has a constant gain control with parameters based on the results of human psychophysical experiments on pattern masking and contrast induction. These experiments have shown that contrast gain control within the visual system is selective for spatial frequency, orientation and color. The model accommodates this result by placing a contrast gain control within each channel and by letting each channel's gain control be influenced selectively by contrasts within all channels. A simple extension to this model provides predictions of color image quality.
We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and demonstrate speech-assisted coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.
In this paper, we discuss issues related to analysis and synthesis of facial images using speech information. An approach to speaker independent acoustic-assisted image coding and animation is studied. A perceptually based sliding window encoder is proposed. It utilizes the high rate (or oversampled) acoustic viseme sequence from the audio domain for image domain viseme interpolation and smoothing. The image domain visemes in our approach are dynamically constructed from a set of basic visemes. The look-ahead and look-back moving interpolations in the proposed approach provide an effective way to compensate the mismatch between auditory and visual perceptions.
Motion estimation is a key issue in video coding. In very low bitrate applications, the amount of the side information for the motion field presents an important portion of the total bitrate. This paper presents a joint motion estimation, segmentation and coding technique, which tries to reduce the segmentation and motion side information, while providing a similar or smaller prediction error when compared to more classical motion estimation techniques. The main application in mind is a region based coding approach in which the consecutive frames of the video are divided into regions having similar motion vectors with simple shapes, easy to encode.
In this paper, we describe an approach to detecting and tracking certain feature points in the mouth region in a talking head sequence. These feature points are interconnected in a polygononal mesh so that the detection and tracking of these points is based on the information not only at these points but also in the surrounding elements. The detection of the nodes in an initial frame is accomplished by a feature detection algorithm. The tracking of these nodes in successive frames is obtained by deforming the mesh so that, when one mesh is warped to the other, the image patterns over corresponding elements in two meshes match with each other. This is accomplished by a modified Newton algorithm which iteratively minimized the error between the two images after mesh-based- warping. The numerical calculation involved in the optimization approach is simplified by using the concept of master elements and shape functions in the finite element method. This algorithm has been applied to a SIF resolution sequence, which contains fairly rapid mouth movement. Our simulation results show that this algorithm can locate and track the feature points in the mouth region quite accurately.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.