The occurrence of shadows in images leads to a damage in image quality and also hinders the performance of downstream visual tasks. Despite significant progress in existing shadow removal methods, they are limited in synthesizing highfrequency signal details, resulting in insufficient synthesis quality. Among them, most of methods based on Generative Adversarial Networks (GANs), while capable of removing shadows, often introduce image artifacts and blurring, particularly underperforming in the processing of high-frequency signals, which impacts the clarity of the image after shadow removal. To address these issues, we propose a high-low frequency guided GAN method for image shadow removal. Concretely, our method decomposes extracted features into multiple frequency components (i.e., low and high frequency) and retains the image's contours and structural information through low-frequency attention skip connections. Meanwhile, high-frequency attention skip connections are employed to alleviate the difficulty for the generator to synthesize details, providing the generator with rich frequency information. Additionally, we also introduce a wavelet-based frequency loss function to reduce the discrepancies between the generated image and the real image in the frequency domain, effectively mitigating the occurrence of artifacts and blurring. Finally, extensive experiments conducted on the ISTD, AISTD, and SRD datasets demonstrate the significant effectiveness of our proposed method, which not only enhances the efficiency of shadow removal but also markedly improves the high-frequency detail quality of the de-shadowed images.
Dynamic gesture recognition is a very important interaction method in human-computer interaction. For the current research, multi-modal data and three-dimensional convolutional neural network are often used for training. Although the recognition accuracy is high and robustness is good, the amount of parameters is large and high computational cost. To solve the problem, a dynamic gesture recognition method based on Temporal Shift Module (TSM) is proposed on the basis of two-dimensional convolutional neural network. This method uses PyConvResNet-50 as the backbone network, adds the TSM module for information exchange in the time dimension, embeds the Motion Excitation module (ME) into the TSM to enhance short-term temporal modeling, and finally uses 2D-FCN for spatiotemporal feature fusion classification. The experimental results show that the recognition accuracy of the model on the large-scale gesture dataset Jester is 96.49%, which is comparable to that of the three-dimensional convolutional neural network, but the calculation amount is reduced by 63% as well. This method is suitable for the field of gesture recognition that requires high real-time performance.
The depth face detector based on an anchor frame has demonstrated good performance with the advancement of science and technology, but tiny faces and partially occluded faces are still difficult to identify under the effect of posture, lighting, occlusion, and other factors. We present the ESSFD, an enhanced single-stage face detector, to increase the robustness of face detection in complex environments. The following are the important points: (1) Enhancing feature extraction by using an optimized ResNet-D network in the backbone network and adding Atrous Convolution Layer supplementary feature pyramids; (2) For data augmentation, use a combination of Gaussian blur and color jitter to lessen the impact of image environmental elements and increase the model’s robustness; (3) Fine-tune the training parameters for the model. In the presence of unusual postures, complicated lighting, and partial occlusion, ESSFD can increase the detection accuracy of tiny faces when compared to RetinaFace. Experiments reveal that in the easy and medium stages of the WIDER FACE dataset, the Average Accuracy (AP) of the ESSFD detector is around 1% higher than that of the advanced RetinaFace face detector (AP=95.373%). In the hard tiny face detection stage, it is around 2% higher than RetinaFace.
In the direction of VR/AR human–machine interaction, natural and simple dynamic gesture recognition research has attracted much attention. For the sake of improve the accuracy of dynamic gesture recognition in Human–Machine Interaction, this paper proposes a new dynamic gesture recognition method FPN-3DResNeXt, which combines two-stream three-dimensional convolutional neural network (3DResNeXt) and feature pyramid (FPN). This method improves the structure of the 3DResNeXt network, adds feature pyramid and attention channel, optimizes the model parameters, and then improves the recognition accuracy; for the sake of improve the convergence speed and stability of the model, it is proposed to add batch normalization (BN) Further optimization of the network reduces the training time. The experimental results show that the dynamic gesture recognition rate of the method proposed in this paper is 95.30%, which is 2.1% higher than that of the gesture recognition method based on 3DResNeXt by comparing with various 3D convolution methods on the EgoGesture dataset, and it has better stability.
Kinect is a motion sensing input device which is widely used in computer vision and other related fields. However, there are many inaccurate depth data in Kinect depth images even Kinect v2. In this paper, an algorithm is proposed to enhance Kinect v2 depth images. According to the principle of its depth measuring, the foreground and the background are considered separately. As to the background, the holes are filled according to the depth data in the neighborhood. And as to the foreground, a filling algorithm, based on the color image concerning about both space and color information, is proposed. An adaptive joint bilateral filtering method is used to reduce noise. Experimental results show that the processed depth images have clean background and clear edges. The results are better than ones of traditional Strategies. It can be applied in 3D reconstruction fields to pretreat depth image in real time and obtain accurate results.
The study was to investigate the influence on stereoscopic cognition with different densities and dot sizes of static random-dot stereograms (RDS). The responses of every subject were recorded with different densities of RDS (10%, 20%, 30%, 40%) and different dot sizes of RDS (2*2 pix, 3*3 pix, 4*4 pix). The results showed that reaction times decreased with increasing densities of RDS and dot sizes of RDS. The reaction time was the shortest when the density of RDS was 30%. And when the dot size was 4*4 pix, the mean response time was the shortest. Therefore, when the density of RDS was 30% and the dot size of RDS was 4*4 pix, it was the most convenient for the stereoscopic cognition.
Visualization of water surface is a hot topic in computer graphics. In this paper, we presented a fast method to generate wide range of water surface with good image quality both near and far from the viewpoint. This method utilized uniform mesh and Fractal Perlin noise to model water surface. Mipmapping technology was enforced to the surface textures, which adjust the resolution with respect to the distance from the viewpoint and reduce the computing cost. Lighting effect was computed based on shadow mapping technology, Snell’s law and Fresnel term. The render pipeline utilizes a CPU-GPU shared memory structure, which improves the rendering efficiency. Experiment results show that our approach visualizes water surface with good image quality at real-time frame rates performance.
Shadow mapping is commonly used in real-time rendering. In this paper, we presented an accurate and efficient method of soft shadows generation from planar area lights. First this method generated a depth map from light’s view, and analyzed the depth-discontinuities areas as well as shadow boundaries. Then these areas were described as binary values in the texture map called binary light-visibility map, and a parallel convolution filtering algorithm based on GPU was enforced to smooth out the boundaries with a box filter. Experiments show that our algorithm is an effective shadow map based method that produces perceptually accurate soft shadows in real time with more details of shadow boundaries compared with the previous works.
We introduce an algorithm for real-time sub-pixel accurate hard shadows rendering. The method focuses on addressing the shadow aliasing due to the limited resolution of shadow maps. We store a partial, approximate geometric representation of the scene’s surfaces which are visible to the light source. Inspired by the fact that aliasing occurs in the shadow silhouette regions, we present an edge detection algorithm using second-order Newton’s Divide Difference to divide shadow maps into two regions: depth-discontinuous region and depth-continuous region. A tangent estimation method based on the geometry shadow map is presented to recover the artifact aliasing of those silhouette regions. Experiments show that our algorithm eliminates the resolution issues and generates hard shadows with high quality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.