Face anti-spoofing (FAS) is crucial for safe and reliable biometric systems. In recent years, deep neural networks have been proven to be very effective for FAS as compared with classical approaches. However, deep learning-based FAS methods are data-driven and use learning-based features only. It is a legitimate question to ask whether hand-crafted features can provide any complementary information to a deep learning-based FAS method. To answer this question, we propose a two-stream network that consists of a convolutional network and a local difference network. To be specific, we first build a texture extraction convolutional block to calculate the gradient magnitude at each pixel of an input image. Our experiments demonstrate that additional liveness cues can be captured by the proposed method. Second, we design an attention fusion module to combine the features obtained from the RGB domain and gradient magnitude domain, aiming for discriminative information mining and information redundancy elimination. Finally, we advocate a simple binary facial mask supervision strategy for further performance boost. The proposed network has only 2.79M parameters and the inference speed is up to 118 frames per second, which makes it very convenient for real-time FAS systems. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits and superiority of the proposed method over the state-of-the-art approaches.
We propose a photorealistic style transfer network to emphasize the natural effect of photorealistic image stylization. In general, distortion of the image content and lacking of details are two typical issues in the style transfer field. To this end, we design a framework employing the U-Net structure to maintain the rich spatial clues, with a multi-layer feature aggregation (MFA) method to simultaneously provide the details obtained by the shallow layers in the stylization processing. In particular, an encoder based on the dense block and a decoder form a symmetrical structure of U-Net are jointly staked to realize an effective feature extraction and image reconstruction. In addition, a transfer module based on MFA and “adaptive instance normalization” is inserted in the skip connection positions to achieve the stylization. Accordingly, the stylized image possesses the texture of a real photo and preserves rich content details without introducing any mask or postprocessing steps. The experimental results on public datasets demonstrate that our method achieves a more faithful structural similarity with a lower style loss, reflecting the effectiveness and merit of our approach.
Recently, deep learning has become a rapidly developing tool in the field of image fusion. An innovative image fusion method for fusing infrared images and visible-light images is proposed. The backbone network is an autoencoder. Different from previous autoencoders, the information extraction capability of the encoder is enhanced, and the ability to select the most effective channels in the decoder is optimized. First, the features of the source image are extracted during the encoding process. Then, a new effective fusion strategy is designed to fuse these features. Finally, the fused image is reconstructed by the decoder. Compared with the existing fusion methods, the proposed algorithm achieves state-of-the-art performance in both objective evaluation and visual quality.
Three-dimensional (3-D) face reconstruction is an important task in the field of computer vision. Although 3-D face reconstruction has been developing rapidly in recent years, large pose face reconstruction is still a challenge. That is because much of the information about a face in a large pose will be unknowable. In order to address this issue, we propose a 3-D face reconstruction algorithm (PIFR) based on 3-D morphable model. A model alignment formulation is developed in which the original image and a normalized frontal image are combined to define a weighted loss in a landmark fitting process, with the intuition that the original image provides more expression and pose information, whereas the normalized image provides more identity information. Our method solves the problem of face reconstruction of a single image of a traditional method in a large pose, works on arbitrary pose and expressions, and greatly improves the accuracy of reconstruction. Experiments on the challenging AFW, LFPW, and AFLW database show that our algorithm significantly improves the accuracy of 3-D face reconstruction even under extreme poses (±90 yaw angles).
We propose a deep learning network L1-2D2PCANet for face recognition, which is based on L1-norm-based two-dimensional principal component analysis (L1-2DPCA). In our network, the role of L1-2DPCA is to learn the filters of multiple convolution layers. After the convolution layers, we deploy binary hashing and blockwise histogram for pooling. We test our network on some benchmark facial datasets, including Yale, AR face database, extended Yale B, labeled faces in the wild-aligned, and Face Recognition Technology database with the convolution neural network, PCANet, 2DPCANet, and L1-PCANet as comparison. The results show that the recognition performance of L1-2D2PCANet in all tests is better than baseline networks, especially when there are outliers in the test data. Owing to the L1-norm, L1-2D2PCANet is robust to outliers and changes of the training images.
Uncontrolled illumination poses severe problems for face recognition in practical application scenarios. Many
techniques to deal with this problem rely on illumination modeling and face relighting. In this paper we propose a
robust approach to face albedo estimation in the framework of illumination modeling with Spherical Harmonics.
This technique requires only a single face image under arbitrary illumination and assumes the face shape is known.
The recovered face albedo facilitates face rendering under new illumination conditions which is useful for both
illumination invariant face recognition and computer animation. In the proposed approach, the consequences of
the violation of the assumption of validity of the spherical harmonics model are mitigated by minimising a cost
function involving robust forms of the error in both the spherical harmonics model and the smoothness constraint.
The robust estimation provides significantly better results than the traditional Least Squares Estimation in the
experiments on a 3D face database.
KEYWORDS: Biometrics, Quality measurement, Databases, Lawrencium, Data fusion, Algorithm development, Information fusion, Signal to noise ratio, Data acquisition, Image fusion
We address the problem of score level fusion of intramodal and multimodal experts in the context of biometric
identity verification. We investigate the merits of confidence based weighting of component experts. In contrast
to the conventional approach where confidence values are derived from scores, we use instead raw measures of
biometric data quality to control the influence of each expert on the final fused score. We show that quality based
fusion gives better performance than quality free fusion. The use of quality weighted scores as features in the
definition of the fusion functions leads to further improvements. We demonstrate that the achievable performance
gain is also affected by the choice of fusion architecture. The evaluation of the proposed methodology involves
6 face and one speech verification experts. It is carried out on the XM2VTS data base.
KEYWORDS: Error analysis, Neural networks, Iris, Tolerancing, Prototyping, Ions, Signal processing, Pattern recognition, Monte Carlo methods, Chemical elements
We investigate bagging of k - NN classifiers under varying set sizes. For certain set sizes bagging often under-performs due to population bias. We propose a modification to the standard bagging method designed to avoid population bias. The modification leads to substantial performance gains, especially under very small sample size conditions. The choice of the modification method used depends on whether prior knowledge exists or not. If no prior knowledge exists then insuring that all classes exist in the bootstrap set yields the best results.
The current trend in content-based retrieval is the development of object-based systems. Such systems enable users to make higher level queries which are more intuitive to them than queries based on visual primitives. In this paper, we present OVID, our Object-based VIDeo retrieval system. It currently consists of a video parsing module, an annotation module, a user interface and a search mechanism. A combined multiple expert approach is at the heart of the video parsing routine for an improved performance. The annotation module extracts color and texture-based region information which will be used by the neural-network-based search routine at query tie. The iconic query paradigm on which the system is based provides users with a flexible means to define object-based queries.
Video database research is commonly concerned with the storage and retrieval of visual information invovling sequence segmentation, shot representation and video clip retrieval. In multimedia applications, video sequences are usually accompanied by a sound track. The sound track contains potential cues to aid shot segmentation such as different speakers, background music, singing and distinctive sounds. These different acoustic categories can be modeled to allow for an effective database retrieval. In this paper, we address the problem of automatic segmentation of audio track of multimedia material. This audio based segmentation can be combined with video scene shot detection in order to achieve partitioning of the multimedia material into semantically significant segments.
Illumination invariance is of paramount importance to annotate video sequences, stored in large video databases. However, popular texture analysis methods, such as multichannel filtering techniques, do not yield illumination-invariant texture representations. In this paper, we assess the effectiveness of three illumination normalization schemes for texture representations, derived from Gabor filter outputs. The schemes aim at overcoming intensity scaling effects, due to changes in illuminating conditions. A theoretical analysis and experimental results, enable us to select one scheme as the most promising. In this scheme, a normalizing factor is derived at each pixel, by combining the energy response of different filters at that pixel. The scheme overcomes illumination variations well, while still preserving discriminatory textural information. Further statistical analysis may shed light on other interesting properties or limitations of the scheme.
In this paper, we propose framework for dynamically reconfigurable video codec. We introduce the concept of a virtual codec and a virtual tool to facilitate the use of multiple codec structures and multiple coding tools in a single video codec. Existing coding standards as well as new codec structures and coding tools can be integrated seamlessly in the proposed codec. The experimental results show that when implementing a standard codec such as H.263 the proposed codec can achieve a comparable performance to dedicated standard codecs. The results also demonstrate the codec's ability to use multiple coding tools and codec structures which are useful for implementing MSDL for MPEG-4 video.
Probabilistic relaxation has been used previously as the basis for the development of an algorithm to label features extracted from an image with corresponding features from a model. The algorithm can be executed in a deterministic manner, making it particularly appropriate for real-time methods. In this paper, we show how the method may be adapted to image sequences, taken from a moving camera, in order to provide navigation information. We show how knowledge of the camera motion can be incorporated into the labelling algorithm in order to provide better real-time performance and improved robustness.
In the paper we are concerned with the efficient coding of image sequences for video- conference applications. In such sequences, large image regions usually undergo a uniform translational motion. Consequently, to maximize the coding efficiency and quality, the codec should be able to segment and estimate multiple translational motions accurately and reliably. Following the above premise, we propose an algorithm which combines several known and new techniques. Firstly, a traditional variable block size motion compensation was used, but employing a novel robust motion estimation algorithm. The algorithm can estimate multiple motions to a sub-pixel accuracy and also provides a reliable motion segmentation. Whenever there exist multiple motions within a block, the motion boundary is recovered and approximated by a straight line. Also, an inter-block motion prediction is used to achieve a further improvement of the compression ratio. A comparison with the H.261 scheme shows that the proposed algorithm produces better results both in terms of PSNR and bit-rate. To judge the contribution of the motion segmentation to the overall performance, experiments have been carried out with a variant of the algorithm where only single motion within any block is allowed. This incapacitated variant emulates a commonly used approach for variable block size coding. The comparison of the proposed and incapacitated variants shows that the use of motion segmentation can lower the bit rate and deliver a better visual quality of the reconstructed image sequence.
We present a method designed to solve the problem of automatic color grading for industrial inspection of textured ceramic tiles. We discuss problems we were confronted with, like the temporal and spatial variation of the illumination, and the ways we dealt with them. Then, we present results of correctly grading a series of textured ceramic tiles, the differences of which were at the threshold of the human perception.
In this paper we present a novel method for mixed pixel classification where the classification of groups of pixels is achieved taking into consideration the higher order moments of the distributions of the pure and the mixed classes. The method is demonstrated using simulated data and is also applied to real Landsat TM data for which ground data are available.
In previous work we presented an algorithm for matching features extracted from an image with those extracted from a model, using a probabilistic relaxation method. Because the algorithm compares each possible match with all other possible matches, the main obstacle to its use on large data sets is that both the computation time and the memory usage are proportional to the square of the number of possible matches. This paper describes some improvements to the algorithm to alleviate these problems. The key sections of the algorithm are the generation, storage, and use of the compatibility coefficients. We describe three different schemes that reduce the number of these coefficients. The execution time is improved in each case, even when the number of iterations required for convergence is greater than in the unmodified algorithm. We show that the new methods also perform well, generating good matches in all cases.
Probabilistic relaxation has been used previously as the basis for the development of an algorithm to match features extracted from an image with corresponding features from a model. The technique has proved very successful, especially in applications that require real- time performance. On the other hand its use has been limited to small problems, because the complexity of the algorithm varies with the fourth power of the problem size. In this paper, we show how the computational complexity can be much reduced. The matching is performed in two stages. In the first stage, only small subsets of the most salient features are used to provide an initial match. The results are used to calculate projective parameters that relate the image to the model. In the second stage, these parameters are used to simplify the matching of the entire feature sets, in a second pass of the matching algorithm.
Machine vision and automatic surface inspection has been an active field of research during the last few years. However, very little research has been contributed to the area of defect detection in textured images, especially for the case of random textures. In this paper, we propose a novel algorithm that uses color and texture information to solve the problem. A new color clustering scheme based on human color perception is developed. No a priori knowledge regarding the actual number of colors associated with the color image is required. With this algorithm, very promising results are obtained on defect detection in random textured images and in particular, granite images.
In this paper an implementation of a high level symbolic scene interpreter for an active vision system is considered. The scene interpretation module uses low level image processing and feature extraction results to achieve object recognition and to build up a 3D environment map. The module is structured to exploit spatio-temporal context provided by existing partial world interpretations and has spatial reasoning to direct gaze control and thereby achieve efficient and robust processing using spatial focus of attention. The system builds and maintains an awareness of an environment which is far larger than a single camera view. Experiments on image sequences have shown that the system can: establish its position and orientation in a partially known environment, track simple moving objects such as cups and boxes, temporally integrate recognition results to establish or forget object presence, and utilize spatial focus of attention to achieve efficient and robust object recognition. The system has been extensively tested using images from a single steerable camera viewing a simple table top scene containing box and cylinder-like objects. Work is currently progressing to further develop its competences and interface it with the Surrey active stereo vision head, GETAFIX.
We have developed a method of matching and recognizing aerial road network images based on road network models. The input is a list of line segments of an image obtained from a preprocessing stage, which is usually fragmentary and contains extraneous noisy segments. The output is the correspondences between the image line segments and model line segments. We use attributed relational graphs (ARG) to describe images and models. An ARG consists of a set of nodes, each node representing a line segment, and attributed relations between nodes. The task of matching is to find the best correspondences between the image ARG and the model ARG. The correspondences are
found using a relaxation labeling algorithm, which optimizes a criterion of similarity. The algorithm is capable of subgraph matching of an image road structure to a map road model covering an area 10 times larger than the area imaged by the sensor, provided that the image distortion due to perspective imaging geometry has been corrected during preprocessing stages. We present matching experiments and demonstrate the stability of the matching method to extraneous line segments, missing line segments, and errors in scaling.
We present here the theory of developing robust test statistics for edge shape matching in one dimensional signals. We show that an unbiased test can be developed under the assumption of uncorrelated noise and this test can be made optimal and robust to perturbations of the assumed noise distribution under the extra assumption of symmetric noise. This approach to edge detection is believed to overcome the shortcomings of the uncertainty principle in image processing and is appropriate for use when edges of a certain type have to be identified with great accuracy in their location.
Machine vision and automatic inspection has been an active field of research during the past few years. In this paper, we review the texture defect detection methods used at present. We classify them in two major categories, global and local, and we discuss briefly the major approaches that have been proposed.
An algorithm is proposed to estimate the general motion of multiple moving objects in an image sequence. The general motion is described by a general motion model which is specified by number of parameters called motion parameters. By estimating the individual motion of the objects, segmentation according to motion is achieved at the same time. The algorithm works directly on an image sequence without using any higher level information such as corners and edges. Furthermore, it is not necessary to go through the estimation of optical flow as an intermediate step, avoiding the error caused by estimating optical flow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.