In this paper, an efficient object detection method YOLO-Ti is proposed to detect tiny facial markers. Our study is driven by the practical requirements of 3D face modeling, requiring the incorporation of as many facial features as possible for reference. This research can even provide information for facial expression recognition and joint deformation. To achieve this, we first present a feature fusion module called Cross-BiFPN, which incorporates additional crossconnecting branches between different network layers to utilize low-level features more effectively. Secondly, we add a high-resolution detection head and attention module to the YOLOv8 model to improve the ability of detecting tiny objects, while at the same time ensuring the lightweight detection model by reducing redundant network layers. Thirdly, we collect a dataset of facial markers with an average size much smaller than publicly available small object datasets. Ablation studies and comparison experiments are conducted to evaluate the performance of our approach. Compared with the baseline YOLOv8 model, YOLO-Ti shows a 30.4% improvement in mAP50 while reducing model parameters by 65.1%. The automatic feature extraction provided by our model facilitates the construction of digital humans, providing significant savings in manpower and time for modelers.
Vision-tangible mixed reality (VTMR) is a further development of the traditional mixed reality. It provides an experience of directly manipulating virtual objects at the perceptual level of vision. In this paper, we propose a mixed reality system called “VTouch”. VTouch is composed of an optical see-through head-mounted display (OST-HMD) and a depth camera, supporting a direct 6 degree-of-freedom transformation and a detailed manipulation of 6 sides of the Rubik’s cube. All operations can be performed based on the spatial physical detection between virtual and real objects. We have not only implemented a qualitative analysis of the effectiveness of the system by a functional test, but also performed quantitative experiments to test the effects of depth occlusion. In this way, we put forward basic design principles and give suggestions for future development of similar systems. This kind of mixed reality system is significant for promoting the development of the intelligent environment with state-of-the-art interaction techniques.
Single point active alignment method is a widely used calibration method for optical-see-through head-mounted displays (OST-HMDs) since its appearance. It always requires high-accuracy alignment for data acquisition, and the collected data affect the calibration accuracy to a large extent. However, there are often many kinds of alignment errors occurring in the calibration process. These errors may contain random errors of manual alignment and system errors of the fixed eye-HMD model. To tackle these problems, we first leverage a random sample consensus approach to recurrently decrease the random error of the collected data sequence and use a region-induced data enhancement method to reduce the system error. We design a typical framework to enhance the data acquisition for calibration, sequentially reducing the random error and the system error. Experimental results show that the proposed method can significantly make the calibration more robust due to the elimination of sampling points with large errors. At the same time, the calibration accuracy can be increased by the proposed dynamic eye-HMD model that takes the eye movement into consideration. The improvement about calibration should be significant to promote the applications based on OST-HMDs.
This paper discusses the imaging principles and the technical difficulties of spatial augmented reality based human face projection. A novel geometry correction method is proposed to realize fast, high-accuracy face model projection. Using a depth camera to reconstruct the projected object, the relative position from the rendered model to the projector can be accessed and the initial projection image is generated. Then the projected image is distorted by using Bezier interpolation to guarantee that the projected texture matches with the object surface. The proposed method is under a simple process flow and can achieve high perception registration of virtual and real object. In addition, this method has a good performance in the condition that the reconstructed model is not exactly same with the rendered virtual model which extends its application area in the spatial augmented reality based human face projection.
The combination of health and entertainment becomes possible due to the development of wearable augmented reality equipment and corresponding application software. In this paper, we implemented a fast calibration extended from SPAAM for an optical see-through head-mounted display (OSTHMD) which was made in our lab. During the calibration, the tracking and recognition techniques upon natural targets were used, and the spatial corresponding points had been set in dispersed and well-distributed positions. We evaluated the precision of this calibration, in which the view angle ranged from 0 degree to 70 degrees. Relying on the results above, we calculated the position of human eyes relative to the world coordinate system and rendered 3D objects in real time with arbitrary complexity on OSTHMD, which accurately matched the real world. Finally, we gave the degree of satisfaction about our device in the combination of entertainment and prevention of cervical vertebra diseases through user feedbacks.
As one of popular immersive Virtual Reality (VR) systems, stereoscopic cave automatic virtual environment (CAVE) system is typically consisted of 4 to 6 3m-by-3m sides of a room made of rear-projected screens. While many endeavors have been made to reduce the size of the projection-based CAVE system, the issue of asthenopia caused by lengthy exposure to stereoscopic images in such CAVE with a close viewing distance was seldom tangled. In this paper, we propose a light-weighted approach which utilizes a convex eyepiece to reduce visual discomfort induced by stereoscopic vision. An empirical experiment was conducted to examine the feasibility of convex eyepiece in a large depth of field (DOF) at close viewing distance both objectively and subjectively. The result shows the positive effects of convex eyepiece on the relief of eyestrain.
Data driven bidirectional reflectance distribution function (BRDF) models have been widely used in computer graphics in
recent years to get highly realistic illuminating appearance. Data driven BRDF model needs many sample data under
varying lighting and viewing directions and it is infeasible to deal with such massive datasets directly. This paper proposes
a Gaussian process regression framework to describe the BRDF model of a desired material. Gaussian process (GP), which
is derived from machine learning, builds a nonlinear regression as a linear combination of data mapped to a highdimensional
space. Theoretical analysis and experimental results show that the proposed GP method provides high
prediction accuracy and can be used to describe the model for the surface reflectance of a material.
3D surface reconstruction is one of the most important topics in Spatial Augmented Reality (SAR). Using structured
light is a simple and rapid method to reconstruct the objects. In order to improve the precision of 3D reconstruction, we
present a high-accuracy multi-view 3D measurement system based on Gray-code and Phase-shift. We use a camera and a
light projector that casts structured light patterns on the objects. In this system, we use only one camera to take photos on
the left and right sides of the object respectively. In addition, we use VisualSFM to process the relationships between
each perspective, so the camera calibration can be omitted and the positions to place the camera are no longer limited.
We also set appropriate exposure time to make the scenes covered by gray-code patterns more recognizable. All of the
points above make the reconstruction more precise. We took experiments on different kinds of objects, and a large
number of experimental results verify the feasibility and high accuracy of the system.
This paper presents a new-type application that interlinks the none-digital world with digital world based on the photo
sensors. Nowadays, Tangible User Interfaces (TUIs) have emerged as a novel interface pattern. It combines the users'
knowledge of physical interaction and digital information together. Meanwhile, the interactive tabletops with optical
sensors show great potential in the field of entertainment, display and education. The system fuses the tangible user
interface and tabletop interaction into an organic whole. With low consumption and high brightness, it achieves the
identification of different physical entities.
HMD (head-mounted display) is an important virtual reality device, which has played a vital role in VR application
system. Compared with traditional HMD which cannot be applied in the daily life owing to their disadvantage on the
price and performance, a new universal and smart Helmet-Mounted Display of large FOV uses excellent performance
and widespread popularity as its starting point. By adopting simplified visual system and transflective system that
combines the transmission-type and reflection-type display system with transflective glass based on the Huggens-Fresnel
principle, we have designed a HMD with wide field of view, which can be easy to promote and popularize. Its resolution
is 800*600, and field of view is 36.87°(vertical)* 47.92°(horizontal). Its weight is only 1080g. It has caught up with the
advanced world levels.
Free-form-surface prism (FFSP) based HMDs have widespread application value and prospect in the fields of virtual
reality and augmented reality. A key problem of the FFSP-based HMDs is the correction for optical distortions. The
distortion correction can be performed with an additional optical system, but the additional system will add the expense
and complexity of the HMDs. This paper presents a software-based method to correct optical distortions in FFSP-based
HMDs, in which a distortion map with predistortion information is constructed to correct the distortion and a pixelfusion
process is performed to improve the quality of the predistortion image. The correction process and the fusion
process are accomplished by GPU in real-time. The performance of the proposed method is analyzed and validated via
an inspection system, in which a high-performance CCD camera is used to evaluate the result of the correction and
fusion.
A surgery navigation system based on augmented reality is presented. The system is based on 3D visualization and 3D
registration techniques with an infrared tracking device and a 3D scanner. After reconstructing the 3D model of the
patient's organs and scanning the surface of the patient's face, the system uses Iterative Closest Point (ICP) algorithm to
calculate the transformation between the 3D model of patient and the three-dimensional scanner. During the surgery
navigation, 3D model can be overlaid onto the image of the real patient. The proposed system doesn't require the
attachment of markers because of the adoption of 3D scanner. Experimental result shows that the tracking accuracy of
the system is appropriate for the requirements of actual surgery and can bring down the risk of endoscopic surgery.
Mixed reality technologies have been studied for many years, which now can be applied in many aspects of our daily life.
Generally, appropriate display devices and registration methods are the key factors for the successful application of the
mixed reality system. In the past decade, various types of display systems have been developed at Beijing Institute of
Technology, and many of them have been successfully employed in different mixed reality applications. In this paper,
we give a brief introduction to the various display systems and their corresponding tracking approach developed and
realized at Beijing Institute of Technology for mixed reality applications. These technologies include an interactive
projection system based on motion detection, a fixed-position viewing system, an ultra-light wide-angle head-mounted
display system and a volumetric 3D display system.
A dynamic Augmented Reality (AR) system, AR-PAINTER, is presented in this paper, which is an indoor, real-time and multi-user system for painting virtual textures and images to the surface of a real object. The system is designed with a vision-based algorithm, which tracks markers of special features on the real object for the purpose of registration. The registration algorithm and the generation of virtual image are discussed. The experimental results are also presented and the results validate the feasibility of the proposed AR system in other applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.