Industrial production often faces a variety of complex working conditions that lead to various defects, including Mura, on the surfaces of various industrial products. We propose a reconstruction network called RecTransformer, which is developed with a transformer for anomaly inpainting. RecTransformer is designed to effectively detect various types of surface defects despite using only a small number of defect samples. RecTransformer simplifies the defect detection problem to a patch-level image completion problem. Without using convolution, the given block image is processed by the transformer model to generate a defect-free reconstructed image. Herein, global semantic information is established, and an attention mechanism is built in the patch sequence, and the spatial information of the patches is determined by position encoding to complete the global image reconstruction process. With a limited number of defect samples as training data, the RecTransformer algorithm accurately reconstructs defects. It achieves an area under the receiver operating characteristic curve score of 97.6% for pixel-level segmentation on the testing dataset. Experiments conducted on a universal surface defect dataset demonstrate the effectiveness of the RecTransformer algorithm. RecTransformer can be adapted to detect various types of surface defects, including Mura in display devices, with only a small number of defect samples.
KEYWORDS: Education and training, Convolution, Feature extraction, Visualization, 3D modeling, Ablation, Matrices, 3D image reconstruction, Image retrieval, Design and modelling
Jointly learned detectors and descriptors are becoming increasingly popular because they can simplify the matching process and obtain more correspondences than traditional tools. However, most methods yield low keypoint detection accuracy due to the large receptive field of the detection score map. In addition, existing methods lack efficient detector loss functions because the coordinates of keypoints are discrete and nonderivable. To mitigate these two problems, we propose a method called dynamic attention-based detector and descriptor with effective and derivable loss (DA-Net). For the first problem, a dynamic attention convolution-based feature extraction module is proposed to select the most suitable parameters for different samples. In addition, a multilayer feature self-difference detection (MFSD) module is proposed to detect keypoints with high accuracy. In the MFSD module, multilayer feature maps are used to calculate their feature self-difference maps, and they are fused to obtain a detection score map. For the second problem, an approximate keypoint distance loss function is proposed by approximately regressing the coordinates of the local maximum as keypoint coordinates, allowing the calculations involving keypoint coordinates to backpropagate. Moreover, two descriptor loss functions are proposed to learn reliable descriptors. A series of experiments based on widely used datasets show that DA-Net outperforms other learned detection and description methods.
Object tracking is still a challenging problem in computer vision, as it entails learning an effective model to account for appearance changes caused by occlusion, out of view, plane rotation, scale change, and background clutter. This paper proposes a robust visual tracking algorithm called deep convolutional neural network (DCNNCT) to simultaneously address these challenges. The proposed DCNNCT algorithm utilizes a DCNN to extract the image feature of a tracked target, and the full range of information regarding each convolutional layer is used to express the image feature. Subsequently, the kernelized correlation filters (CF) in each convolutional layer are adaptively learned, the correlation response maps of that are combined to estimate the location of the tracked target. To avoid the case of tracking failure, an online random ferns classifier is employed to redetect the tracked target, and a dual-threshold scheme is used to obtain the final target location by comparing the tracking result with the detection result. Finally, the change in scale of the target is determined by building scale pyramids and training a CF. Extensive experiments demonstrate that the proposed algorithm is effective at tracking, especially when evaluated using an index called the overlap rate. The DCNNCT algorithm is also highly competitive in terms of robustness with respect to state-of-the-art trackers in various challenging scenarios.
A good image feature representation is crucial for image classification tasks. Many traditional applications have attempted to design single-modal features for image classification; however, these may have difficulty extracting sufficient information, resulting in misjudgments for various categories. Recently, researchers have focused on designing multimodal features, which have been successfully employed in many situations. However, there are still some problems in this research area, including selecting efficient features for each modality, transforming them to the subspace feature domain, and removing the heterogeneities among different modalities. We propose an end-to-end multimodal deep neural network (MDNN) framework to automate the feature selection and transformation procedures for image classification. Furthermore, inspired by Fisher’s theory of linear discriminant analysis, we improve the proposed MDNN by further proposing a multimodal multitask deep neural network (M2DNN) model. The motivation behind M2DNN is to improve the classification performance by incorporating an auxiliary discriminative constraint to the subspace representation. Experimental results on five representative datasets (NUS-WIDE, Scene-15, Texture-25, Indoor-67, and Caltech-101) demonstrate the effectiveness of the proposed MDNN and M2DNN models. In addition, experimental comparisons of the Fisher score criterion exhibit that M2DNN is more robust and has better discriminative power than other approaches.
An adaptive edge detection and mapping (AEDM) algorithm to address the challenging one-dimensional barcode recognition task with the existence of both image degradation and barcode shape deformation is presented. AEDM is an edge detection-based method that has three consecutive phases. The first phase extracts the scan lines from a cropped image. The second phase involves detecting the edge points in a scan line. The edge positions are assumed to be the intersecting points between a scan line and a corresponding well-designed reference line. The third phase involves adjusting the preliminary edge positions to more reasonable positions by employing prior information of the coding rules. Thus, a universal edge mapping model is established to obtain the coding positions of each edge in this phase, followed by a decoding procedure. The Levenberg–Marquardt method is utilized to solve this nonlinear model. The computational complexity and convergence analysis of AEDM are also provided. Several experiments were implemented to evaluate the performance of AEDM algorithm. The results indicate that the efficient AEDM algorithm outperforms state-of-the-art methods and adequately addresses multiple issues, such as out-of-focus blur, nonlinear distortion, noise, nonlinear optical illumination, and situations that involve the combinations of these issues.
Stability analysis of various neural networks have been successfully applied in many fields such as parallel computing and pattern recognition. This paper is concerned with a class of stochastic Markovian jump neural networks. The general mean-square stability of Backward Euler-Maruyama method for stochastic Markovian jump neural networks is discussed. The sufficient conditions to guarantee the general mean-square stability of Backward Euler-Maruyama method are given.
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
As an important measuring method in velocity measuring field, Particle Image Velocimetry(PIV), which follows the principle of dividing the maximum displacement of tracer particles by the corresponding time, is applied more and more widely in various subjects, and the accuracy of which is influenced by the choice of the time delay to some extent. The existing PIV system usually chooses a fixed time delay, which could not meet the need of the application in measuring the vector of time varying flow field with a relatively high measuring accuracy. Considering the weakness of this, we introduce a new kind of adjustable frame-straddling image formation system for PIV application to improve the accuracy in this paper. The image formation system is implemented mainly because of two parts: a dual CCD camera system which is carefully designed to capture the frame-straddling image pairs of the flow field with an adjustable time delay controlled by the externally trigger signals, and an effective subpixel image registration algorithm, which is used to calculate vector of the time varying flow field on the hardware platform, which generates the two channels of trigger signals with the adjustable time delay according to the instantaneous calculating vector of flow field. Experiments were performed for several time varying flows to verify the effectiveness of the image formation system and the results shows that the accuracy was improved in calculating the vector of the flow field based on such image formation system to some extent.
Currently, high-speed vision platforms are widely used in many applications, such as robotics and automation industry. However, a personal computer (PC) whose over-large size is not suitable and applicable in compact systems is an indispensable component for human-computer interaction in traditional high-speed vision platforms. Therefore, this paper develops an embedded real-time and high-speed vision platform, ER-HVP Vision which is able to work completely out of PC. In this new platform, an embedded CPU-based board is designed as substitution for PC and a DSP and FPGA board is developed for implementing image parallel algorithms in FPGA and image sequential algorithms in DSP. Hence, the capability of ER-HVP Vision with size of 320mm x 250mm x 87mm can be presented in more compact condition. Experimental results are also given to indicate that the real-time detection and counting of the moving target at a frame rate of 200 fps at 512 x 512 pixels under the operation of this newly developed vision platform are feasible.
In this paper, we put forward a novel approach based on hierarchical teaching-and-learning-based optimization (HTLBO) algorithm for nonlinear camera calibration. This algorithm simulates the teaching-learning ability of teachers and learners of a classroom. Different from traditional calibration approach, the proposed technique can find the nearoptimal solution without the need of accurate initial parameters estimation (with only very loose parameter bounds). With the introduction of cascade of teaching, the convergence speed is rapid and the global search ability is improved. Results from our study demonstrate the excellent performance of the proposed technique in terms of convergence, accuracy, and robustness. The HTLBO can also be used to solve many other complex non-linear calibration optimization problems for its good portability.
IC marking provides information about the integrated circuit chips, such as product function and classification. So IC marking inspection is one of the essential processes in semiconductor fabrication. A real-time IC chip marking defect inspection method is presented in this paper. The method comprises the following steps: chip position detection, characters segmentation, feature extraction and classification. The extracted features are used in a back propagation neural network for classifying the types of marking errors such as illegible characters, missing characters and misprinted characters. Character segmentation is an essential part of the inspection method. It is a considerable challenge to segment touching and broken characters correctly, due to uneven illumination, motion blur, as well as problems in the printing process. In order to segment the characters rapidly and accurately, a novel approach for character segmentation based on vertical projection and the character features is proposed. Experiments using a TSSOP20 packaging chip demonstrate that our method can inspect an IC marking with 17 different characters in just 130ms. The system achieves a maximum recognition rate of 98.5%. As a result, it is an ideal solution for a real-time IC marking recognition and defects inspection system.
Face alignment is critical for face recognition, and the deep learning-based method shows promise for solving such issues, given that competitive results are achieved on benchmarks with additional benefits, such as dispensing with handcrafted features and initial shape. However, most existing deep learning-based approaches are complicated and quite time-consuming during training. We propose a compact face alignment method for fast training without decreasing its accuracy. Rectified linear unit is employed, which allows all networks approximately five times faster convergence than a tanh neuron. An eight learnable layer deep convolutional neural network (DCNN) based on local response normalization and a padding convolutional layer (PCL) is designed to provide reliable initial values during prediction. A model combination scheme is presented to further reduce errors, while showing that only two network architectures and hyperparameter selection procedures are required in our approach. A three-level cascaded system is ultimately built based on the DCNNs and model combination mode. Extensive experiments validate the effectiveness of our method and demonstrate comparable accuracy with state-of-the-art methods on BioID, labeled face parts in the wild, and Helen datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.