We propose an integral three-dimensional (3D) display system with a wide viewing zone and depth range using a time-division display and eye-tracking technology. In the proposed system, the optical viewing zone (OVZ) is narrowed to a size that only covers an eye to increase the light ray density using a lens array with a long focal length. In addition, a system with low crosstalk with respect to the viewer’s movement is constructed by forming a combined OVZ (COVZ) that covers both eyes through a time-division display. Further, an eye-tracking directional backlight is used to dynamically control the COVZ and realize a wide system viewing zone (SVZ). The luminance unevenness is reduced by partially overlapping two OVZs. The combination of OVZs formed a COVZ with an angle that is ∼1.6 times larger than that of the OVZ, and an SVZ of 81.4 deg and 47.6 deg for the horizontal and vertical directions, respectively, was achieved using the eye-tracking technology. The comparison results of the three types of display systems (i.e., the conventional system, our previously developed system, and our currently proposed system) confirmed that the depth range of the 3D images in the proposed system is wider than that of the other systems.
We have developed the Pixel-level Visible Light Communication (PVLC) projector based on the DLP (Digital Light
Processing) system. The projector can embed invisible data pixel by pixel into a visible image to realize augmented
reality applications. However, it cannot update either invisible or visible contents in real time. In order to solve the
problem, we improve the projector so that a PC can dynamically control the system and enable us to achieve a
high-frame-rate feature by resolution conversion. This paper proposes the system framework and the design method for
the dynamically reconfigurable PVLC projector.
Fluorescence microscopies have been used as an essential tool in biomedical research, because of better signal to noise
ratio compared to other microscopies. Among the various kinds of fluorescence microscopies, wide field fluorescence
microscopy (WFFM) and confocal fluorescence microscopy are generally most widely used. While confocal microscopy
image has higher clarity than WFFM, it is not suitable for live cells because of a number of major drawbacks such as
photo-bleaching and low image acquisition speed. The purpose of this paper is to obtain clearer live cell images by restoring
degraded WFFM image. Many studies have been carried out for the purpose of obtaining clearer live cell images
by restoring degraded WFFM images, while most of them are not based on regularized MLE (Maximum likelihood estimator)
which restores the image by maximizing Poisson likelihood. However, the MLE method is not robust to noise
because of ill posed problems. Actually, Gaussian as well as Poisson noise exists in the WFFM image. There are some
approaches to improve noise robustness, but these methods cannot guarantee the convergence of likelihood. The purpose
of this paper is to obtain clearer live cell images by restoring degraded WFFM images utilizing a robust deconvolution
method for WFFM using generalized expectation maximization (GEM) algorithm that guarantees the convergence of a
regularized likelihood. Moreover, we actualized a blind deconvolution that can restore the images and estimate point
spread function (PSF) simultaneously, while most other researches assume that the PSF is previously known. We performed
the proposed algorithm on fluorescent bead and cell images. Our results show that the proposed method restores
more accurately than existing methods.
In this paper, we describe a method to combine two integral photography (IP) displays to represent a larger
amount of depth while maintaining image quality. We adopt integral videography (IV), which can emit 4D
light fields into real space to present full-parallax 3D videos. The two IVs are located at different depths
on the same optical axis by using a beam splitter. We present some steps to enhance the quality of our
3D display. First, we remove positional displacements based on an adjustment between the LCD and the
lens array of IV. The positional displacements include parallel, rotational, and depth displacement. Next,
we strictly adjust the two IVs' positions to right positions. Adjusting geometrical positions is based on
optical rotations and shifts. Finally, we run the hidden surface removal to hide surfaces of 3D objects on
the background display to prevent viewers from seeing the distracting surface. In conclusion, our optically
multilayered light field display is effective for enhancing the depth of field.
We present a new concept of scene adaptive imaging scheme for integral photography (IP), which is named
as "adaptive IP (AIP) imaging." Our proposal is to use variable focus lenses to compose the lens array for IP
imaging. Our scheme will greatly enhance the potential of free-viewpoint image synthesis from IP images, because
the sampling pattern of light-field can be optimized for the scene structure. We first introduce a theoretical model
describing how to optimize the light field sampling for the target scene, by using our virtual camera model in
the Hough transform space. We then describe our prototype implementation with 64 liquid lenses compactly
arranged in an 8 by 8 matrix, and preliminary results with it. Our imaging scheme can be regarded as an example
of Programmable Imaging, and will contribute to this new trend of imaging methods.
We present a novel 3D display that can show any 3D contents in free space using laser-plasma scanning in the air. The
laser-plasma technology can generate a point illumination at an arbitrary position in the free space. By scanning the
position of the illumination, we can display a set of point illuminations in the space, which realizes 3D display in the
space. This 3D display has been already presented in Emerging Technology of SIGGRAPH2006, which is the basic platform of our 3D display project. In this presentation, we would like to introduce history of the development of the laser-plasma scanning 3D display, and then describe recent development of the 3D contents analysis and processing technology for realizing an innovative media presentation in a free 3D space. The one of recent development is performed to give preferred 3D contents data to the 3D display in a very flexible manner. This means that we have a platform to develop an interactive 3D contents presentation system using the 3D display, such as an interactive art presentation using the 3D display. We would also like to present the future plan of this 3D display research project.
KEYWORDS: Image compression, Image segmentation, Cameras, Image quality, Scalable video coding, JPEG2000, Image transmission, 3D video streaming, 3D acquisition, Imaging systems
This paper proposes a scalable coding scheme for interactive streaming of dynamic light fields, in which a region of interest (ROI) approach is applied for multi-view image sets. In our method, the image segments that are essential for synthesizing the view requested by a remote user are included in an ROI, which is compressed and transmitted with high priority. Since the data for the desired view are transmitted with the data for its neighboring views as the ROI, the user can render high quality novel views around the desired viewpoint before the arrival of the next frame data. Thus our method can compensate the movement of the remote user even if the network has high latency. Since the user can arbitrarily choose the movable range of the viewpoint by changing the size and weight ratio of the ROI, we call this functionality view-dependent scalability. Using a modified JPEG2000 codec, we evaluated the view-dependent scalability of our scheme by measuring the quality of synthesized views against the distance from the originally desired viewpoint.
We have proposed a 3D live video system named LIFLET which stands for Light Field Live with Thousands of Lenslets. It is a computer graphics system based on the optical system of integral photography. It captures a dynamic 3D scene with a camera through an array of lenslets and synthesizes arbitrary views of the scene in real time. Though synthetic views are highly photo-realistic, their quality is limited by the configuration of the optical system and the number of pixels of the camera. This limitation has not been well discussed in our prior works. The contributions of this paper are as follows. First, we introduce a theoretical analysis based on geometrical optics for formulating the upper limit of spatial frequency captured by the system. Second, we propose a system which uses a combination of an array of lenslets and multiple cameras based on that theoretical analysis. We call it McLiflet since it is a multiple-camera version of LIFLET. The proposed system significantly improves the quality of synthetic views compared with the prior version which uses only one camera. This result confirms our theoretical analysis.
KEYWORDS: Fresnel lenses, Cameras, 3D image processing, Image processing, Photography, Imaging systems, GRIN lenses, Digital cameras, 3D displays, Digital imaging
This paper proposes a system which can capture a dynamic 3D scene and synthesize its arbitrary views in real time. Our system consists of four components: a fresnel lens, a micro-lens array, an IEEE1394 digital camera, and a PC for rendering purpose. The micro-lens array forms an image which consists of a set of elemental images, in other words, multiple viewpoint images of the scene. The fresnel lens controls the depth of field by demagnifying the 3D scene. The problem is that the scene demagnified by the fresnel lens is compressed along its optical axis. Therefore, we propose a method for recovering the original scene from the compressed scene. The IEEE1394 digital camera captures multiple viewpoint images at 15 frames per second, and transfers these images to the PC. The PC synthesizes any perspective of the captured scene from the multiple viewpoint images using image-based rendering techniques. The proposed system synthesizes one perspective of the captured scene within 1/15 second. This means that a user can interactively move his/her viewpoint and observe even a moving object from various directions.
In the field of 3-D imaging technology, Integral Photography (IP) is one of the promising approaches, and a combination of an HDTV camera and an optical fiber array has been investigated to display 3-D live video sequences. The authors have applied this system to a computer graphics method for synthesizing arbitrary views from IP images: a method of interactively displaying free-viewpoint images without physical lens array. This paper proposes a real-time method of estimating depth data corresponding to each element image on an IP image. Experimental results show that the proposed method is very useful for improving the quality of the free-viewpoint image synthesis.
The Multimedia Ambiance Communication project has been researched and developed using 3D image space that is shared by people in different locations. In this communication, 3D space is constructed of layered structure defined for long-range, middle-range, short-range views. This research especially focuses on buildings in the middle-range view. We acquire 3D data of buildings using a range scanner and texture data, and represent them in VR space. In the case of real buildings, restrictions result in the acquisition of partial data. To compensate for this, other information such as drawings, photographs, general knowledge, etc, were used. The authors detail the construction of photo-realistic representation of buildings and 3D space.
This paper proposes a technique to generate virtual views of a natural panorama scene. The scene is captured by an original 3-camera system. The images are stitched into a stereo panorama and the depth is estimated. The texture panorama is segmented into regions, each of which can be regarded to be approximated as a plane. The planar parameter set of the region for setting representation is calculated depending on the depth data. According to the representation the virtual views are generated using center panorama texture, and left and right panoramas are used for occlusion compensation.
Multimedia Ambiance Communication is as a means of achieving shared-space communication in an immersive environment consisting of an arch-type stereoscopic projection display. Our goal is to enable shared-space communication by creating a photo-realistic three-dimensional (3D) image space that users can feel a part of. The concept of a layered structure defined for painting, such as long-range, mid-range, and short-range views, can be applied to a 3D image space. New techniques, such as two-plane expression, high quality panorama image generation and setting representation for image processing, 3D image representation and generation for photo- realistic 3D image space have been developed. Also, we propose a life-like avatar within the 3D image space. To obtain the characteristics of user's body, a human subject is scanned using a CyberwareTM whole body scanner. The output from the scanner, a range image, is a good start for modeling the avatar's geometric shape. A generic human surface model is fitted to the range image. The obtained model is topologically equivalent even if our method is applied to another subject. If a generic model with motion definitions is employed, and common motion rules can be applied to all models made from the generic model.
For a high sense of reality in the next-generation communications, it is very important to realize three-dimensional (3D) spatial media, instead of existing 2D image media. In order to comprehensively deal with a variety of 3D visual data formats, the authors first introduce the concept of ”Integrated 3D Visual Communication,” which reflects the necessity of developing a neutral representation method independent of input/output systems. Then, the following discussions are concentrated on the ray-based approach to this concept, in which any visual sensation is considered to be derived from a set of light rays. This approach is a simple and straightforward to the problem of how to represent 3D space, which is an issue shared by various fields including 3D image communications, computer graphics, and virtual reality. This paper mainly presents the several developments in this approach, including some efficient methods of representing ray data, a real-time video-based rendering system, an interactive rendering system based on the integral photography, a concept of virtual object surface for the compression of tremendous amount of data, and a light ray capturing system using a telecentric lens. Experimental results demonstrate the effectiveness of the proposed techniques.
In this paper we propse an approach to generate a panorama depth map which is dense enough to be used as a template for cutting out a texture image by the depth, i.e. structuring the scene into layers to construct a shared 3-D image space. For depth estimation, we make use of census transform for robust determination of correspondence of a stereo pair. To interpolate unknown disparities, we introduced a process influenced by the K-means algorithm. For densification of the depth data, we make use of Region Competition. During the stitching, the confidence of the data is improved for the overlapped areas by multiple evaluation of the disparity data.
A method for high quality stereo panorama mosaicing is presented. The surrounding scene is captured by our original 3-camera system as stereo moving picture sequences, and the images are stitched after improvement of the texture quality. The multi-purpose 3-camera system features accurate frame synchronization between 3 channels, and can be used outdoors through battery operation. For registration of panorama stitching, under the newly investigated distortion- free condition. Affine parameters are estimated through the overlapped areas by the steepest descent algorithm. The texture improvement has two steps, one is vertical resolution recovery by field integration, and the other is image enhancement by 2D quadratic Volterra filter which satisfies Weber-Fechner's law. The presented method enables high quality stereo mosaicing with accurate mutual disparities between the channel and without visible distortion of textures.
Multi-media Ambiance Project of TAO has been researching and developing an image space, that can be shared by people in different locations and can lend a real sense of presence. The image space is mainly based on photo-realistic texture, and some deformations which depend on human vision characteristics or pictorial expressions are being applied. We aim to accomplish shared-space communication by an immersive environment consisting of the image space stereoscopically projected on an arched screen. We refer to this scheme as 'ambiance communication.' The first half of this paper presents global descriptions on basic concepts of the project, the display system and the 3-camera image capturing system. And the latter half focuses on two methods to create a photo-realistic image space using the captured images of a natural environment. One is the divided expression of the long-range view and ground, which not only gives more realistic setting of the ground but commands more natural view when synthesized with other objects and gives potentialities of deformations for some purposes. The other is the high quality panorama generation based on even-odd field integration and image enhancement by a two dimensional quadratic Volterra filter.
In projection-based virtual reality systems, such as the CAVE, users can observe immersive stereoscopic images. To date, most of the images, projected onto the screens, are synthesized form polygonal models, which represent the virtual world, This is because the resolution and the viewing angle of a real time are not enough for such large screen systems. In this paper, the authors propose a novel approach to avoid the problem by exploiting the human visual systems. In the proposed system, the resolution of the center of view is very high, while that of the rest is not so high. The authors constructed a four-camera system, in which the pairs of NTSC cameras are prepared for both left and right eyes. Four video streams are combined into one video stream and captured by a graphics computer. Wide-angle multi-resolution images are synthesized in real-time from the combined video stream. Thus, we can observe the wide- angle stereoscopic video, while the resolution of the center of view is high enough. Moreover, this paper proposed another configuration of the four-camera system. Experimental results show that we can observe three levels of viewing angle and resolution by the stereoscopic effects, while images for each eye has just two levels. The discontinuities in the multi-resolution images are effectively suppressed by this new lens configuration.
In this paper, a person-specific facial expression recognition method which is based on Personal Facial Expression Space (PFES) is presented. The multidimensional scaling maps facial images as points in lower dimensions in PFES. It reflects personality of facial expressions as it is based on the peak instant of facial expression images of a specific person. In constructing PFES for a person, his/her whole normalized facial image is considered as a single pattern without block segmentation and differences of 2-D DCT coefficients from neutral facial image of the same person are used as features. Therefore, in the early part of the paper, separation characteristics of facial expressions in the frequency domain are analyzed using a still facial image database which consists of neutral, smile, anger, surprise and sadness facial images for each of 60 Japanese males (300 facial images). Results show that facial expression categories are well separated in the low frequency domain. PFES is constructed using multidimensional scaling by taking these low frequency domain of differences of 2-D DCT coefficients as features. On the PFES, trajectory of a facial image sequence of a person can be calculated in real time. Based on this trajectory, facial expressions can be recognized. Experimental results show the effectiveness of this method.
KEYWORDS: 3D image processing, 3D acquisition, Laser scanners, Digital cameras, 3D scanning, Multimedia, Digital imaging, 3D metrology, Image acquisition, Data acquisition
This paper addresses a new scheme of acquisition of 3D image representation from range data and texture data. The concept of a layered structure defined for painting, such as long- range, mid-range, and short-range views, that can be applied to a 3D image. Long and mid-range views are located at a reasonable distance, and therefore do not require the perfect 3D structure. Instead of describing the perfect 3D structure, an image can be represented more simply with range information through a plane-model approximation. Setting representation can be used like stage setting to approximate objects and describe a 3D structure by plane- model. It is effective as a simplified means of describing 3D image space as in long and mid-range views that does not require a detailed 3D structure. We can obtain the parameters of plane shape from the integration of range data measured by a 3D laser scanner and a number of still images captured by a digital camera.
KEYWORDS: Cameras, Video, 3D image processing, 3D modeling, Video processing, 3D vision, Imaging systems, Image processing, 3D displays, Virtual reality
In the field of 3-D image communication and virtual reality, it is very important to establish a method of displaying arbitrary views of a 3-D scene. It is sure that the 3-D geometric models of scene objects are very useful for this purpose, since computer graphics techniques can synthesize arbitrary views of the models. It is, however, not so easy to obtain the models of objects in the physical world. In order to avoid this problem, a new technique, called image-based rendering, has been proposed for interpolating between views by warping input images, using depth information or correspondences between multiple images. To date, most of the works on this new technique has been concentrated on static scenes or objects. In order to cope with 3-D scenes in motion, we must establish the ways of processing multiple video sequences in real-time, and constructing accurate camera array system. In this paper, the authors propose a real-time method of rendering arbitrary views of 3-D scenes in motion. The proposed method realizes a sixteen camera array system with software adjusting support and a video-based rendering system. According to the observer's viewpoint, appropriate views of 3- D scenes are synthesized in real-time. Experimental results show the potential applicability of the proposed method to the augmented spatial communication systems.
KEYWORDS: Cameras, 3D image processing, Data conversion, Visual communications, Image processing, Visualization, Imaging systems, Chromium, Data communications, 3D visualizations
In the field of 3-D image, several kinds of input/output methods are developed and still making rapid progress. Considering such situation, it is desirable that the format of 3-D data is independent on input/output methods. For this purpose, ray-based representation has been proposed. In this method, 3-D physical space is represented by rays which propagate in the space. If whole light rays are completely described, 3-D space can be reproduced correctly from light ray data. However, we can only obtain sample data of light rays, e.g. multiview images. Moreover, the parameters which represent the position and direction of light rays are also sampled. If the sampling of ray parameters is not proper, it is probable that original image are not reproduced correctly from the light rays. In this paper, we discuss the effects of the sampling in mutual conversion between multiview images and light rays. Furthermore, we present sampling methods to reproduce original images correctly for several camera arrangements.
KEYWORDS: 3D image processing, Visualization, 3D displays, Telecommunications, Visual communications, Data compression, Data conversion, 3D visualizations, Image compression, Holograms
As 2D image communication systems come into use widely, 3D imaging technology enhancing the reality of visual communication is getting to be considered as a promising next-generation medium that can revolutionize information systems. To date, 3D image communication has not been discussed at a comprehensive level because several kinds of promising 3D display technologies are still making rapid progress. Considering such a situation, this paper introduces the concept of the 'integrated 3D visual communication system'. The key feature in this new concept is a display-independent neutral representation of visual data. The flexibility of this concept will promote the progress of 3D image communication systems before the 3D display technology reaches maturity. In this paper, for this purpose, ray-based approach is examined. In the present representation method, the whole ray data is equally treated as a set of orthogonal views of the scene objects. The advantage of this approach is to allow the synthesis of any perspective view by gathering appropriate ray data from the set of orthogonal views independently of any geometric representation. A real-time progressive transmission method has been also examined. The experimental results show how the present representation method could be applied to the next-generation 3D image communication system.
This paper presents new methods for partitioning a set of multi-view images into 3-D regions corresponding to objects in the scene in order to parse raw multi-view data into a 3-D region based structured representation. For this purpose, color, position, and disparity information at each pixel are incorporated as an attributes vector into the segmentation process. We propose three methods, all of which are based on K-means clustering algorithm. The first method is sensitive to the estimation error of disparity at each pixel, as it is formulated assuming that the estimated disparity is accurate. We solve this problem in the second method by prohibiting estimated disparity from being used for calculating the distance between attributes vectors. Finally, a third method is proposed to reduce the calculation cost of the segmentation process. As each 3-D region has one-to-one correspondence to an object or surface in the scene, 3-D region based structured representation of multi-view images is useful and powerful for data compression, view interpolation, structure recovery, and so on. The experimental results show the potential applicability of the method to the next-generation 3-D image communication system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.