Image classification is an essential component in the modern computer vision field, in which dictionary learning–based classification has garnered significant attention due to its robustness. Generally, most dictionary learning algorithms can be optimized through data augmentation and regularization techniques. In data augmentation research, focus is often placed on enhancing features of specific class samples, whereas the impact of shared information among images of different categories is overlooked. High inter-class shared information can make it challenging to differentiate among categories. To tackle this concern, the paper advocates an innovative data augmentation approach. The proposed method reduces excessive similarity within class samples by randomly replacing pixel values, thereby improving the classification performance. Building on this, we designed a joint dictionary learning algorithm that embeds label and local consistency. The basic steps of the proposed algorithm are as follows: (1) generate specific auxiliary samples as training samples, (2) initialize the dictionary and expression coefficients, (3) introduce label constraints and local constraints and update the dictionary, and (4) generate a classifier and classify the test samples. Extensive experiments have demonstrated the efficiency of the proposed approach.
Most existing studies on person reidentification (Re-ID) utilize deep feature representation learning. However, images often contain occlusion situations and nondiscriminative personal information. To extract more representative features, some researchers extract implicit deep semantic information by designing complex modules, such as mask maps and human pose landmarks. However, this can introduce complex human annotation and computational work. To overcome these issues, we propose a Re-ID model called multifeature fusion network (MFFNet). Our network does not require any additional auxiliary information and incorporates two new designs: the feature refinement pooling block (FRPB) and the feed-forward conduction structure (FCS). Based on the “split-learn-merge” principle, the FRPB decomposes a person’s features into coarse-grained to fine-grained representations. The FRPB learns the corresponding local detail information and merges the multigranular features into the partial person representation. To address the issue of most current methods heavily relying on accurate bounding boxes, the FCS enables character matching at different resolution scales by learning multiple semantic levels of representation. Through a series of ablation experiments, we demonstrate that the proposed strategy is effective for person Re-ID tasks. The results indicate that MFFNet achieves more competitive experimental results than the existing state-of-the-art methods.
Pedestrian detection is a hot and difficult topic in the computer vision field. The Histograms of Oriented Gradients (HOG) feature, because of its high performance in accuracy, is widely used in pedestrian detection. Nonetheless, its information description capacity needs further improvement. so, I-HOG (Improved HOG) was proposed. I-HOG has two major improvements. First, I-HOG enhances the description of edge features. Through the different scales for the block histograms of a set of correlation graphs, makes the correlation between characteristic information. Second, I-HOG using multi-scale feature extraction methods, include wider edge feature description information, make up for the deficiencies of the HOG feature, because HOG features are only extracted in fixed block size, The experimental results show that in the INRIA database, using I-HOG, detection rate increased by 5.4% and 4.3% respectively, combined with the feature of CSS after detection rate increased by 2.8% and 4.0% respectively compared to the HOG.
A key to dictionary learning is to attain a robust dictionary, which enables difference between test samples and training samples of the same class to be alleviated. Owing to this factor, the dictionary can bring proper representations of test samples and produce better classification results for them. For face recognition, because of varying facial appearance caused by changeable illuminations, poses and facial expressions, a robust dictionary is definitely preferred. In this paper, we propose a robust dictionary learning method for face recognition. Robustness is attained in a two-fold way. First, auxiliary faces are produced via original face images. Second, the scheme to attain the dictionary under the condition that label coefficients can deviate from sample coefficients is designed. Auxiliary faces express possible variations of faces. Moreover, it seems that difference between auxiliary faces and original training samples of the same class somewhat reflects difference between test samples and training samples, thus use of auxiliary faces is beneficial to improve robustness of the method. The scheme to attain the dictionary further enhances robustness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.