Open Access Paper
28 December 2022 Reference set based metric learning method for person re-identification against overfitting problem
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125066G (2022) https://doi.org/10.1117/12.2662666
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
Similarity distance measurement is an important method in data classification, data recognition and other tasks, and has a very wide range of applications in machine learning, computer vision and other fields. However, there are model overfitting problems in complex data classification identification tasks in existing metric learning model. And those problems will negatively infect the accuracy and stability of the metric models. We study on person re-identification (person re-ID) task to design a robust similarity distance metric learning model based on a novel approach of overcoming over-fitting problem. The proposed method sets up a reference set based on training sample. Using the reference set and test images to form similar sample pairs, we can optimize the distribution information and projection feature. Finally, by testing on benchmark dataset, VIPeR, the experimental results validate the effectiveness of the proposed method. It achieves the best identification rates.

1.

INTRODUCTION

Similarity Distance Measurement is a major research area in machine learning and computer vision. In recent years, computer vision tasks have become the hottest topic of technological development in today’s society. Traditional Similarity Distance Measurement is mainly studied for the distance metric between image feature vectors to find a discriminate and robust metric model to measure the similarity between two samples. However, traditional similarity distance measurement methods usually design a metric model based on the feature vector of one-shot image, using the label information of the data to learn a metric subspace so that the distance of sample pair with same label is as small as possible and the distance of sample pair with different label is as large as possible. Compared with the tasks of image matching and recognition in traditional computer vision, these problems application scenarios are more complex, and the targets in the images have serious misalignment problems, which leads to extremely unstable image features and brings a great challenge to the similarity distance metric of images.

2.

RELATED WORK

Person re-identification is a hot topic and attracts a large number of researchers study on this topic1-5. Person re-recognition is very complex and the target’s appearance is affected by various factors, including camera angle, gait, background, lighting, etc. Existing models is of poor identification accuracy on test data4. Thus, person re-identification task is very challenging. The researchers utilized various methods to improve the identification accuracy of existing methods.

For metric learning based method, researchers target metric learning algorithms by learning a robust and discriminative metric model. Weinberger et al.6 designed a metric learning algorithm based on the k-nearest-neighbor to learn a Mahalanobis distance. This method learns a metric model that draws closer the k-nearest neighbor samples with the same label, while expanding the distance of sample pair with different labels. Davis et al.7 build an optimization model by establishing two Gaussian distributed differential relative entropy functions to learn a novel distance model for pedestrian image similarity measurement. Guillaumin et al.8 modelled logistic discrimination based metric method such that the similarity of positive pair based on this metric model are smaller than that of negative pair. Zheng et al.9, on the other hand, established a projection subspace learning method for relative distance comparison and probabilized the model. The model learns a metric subspace by maximizing the probability that the positive samples and the negative samples are correctly recognized. Koestinger et al.10, on the other hand, use the idea of hypothesis testing to establish a likelihood ratio model based on the distribution probabilities of positive and negative samples to learn a Mahalanobis distance. Moreover, Liao et al.11, learn a projection subspace based on a quadratic discriminant analysis based on a sample pair difference vector space. In addition, the KISSME algorithm is used to learn a distance function for the projected samples. Zhao et al.12 introduced the kernelization approach and proposed a kernelized stochastic KISS algorithm, which enhances the Gaussian distribution characteristics of the training data by kernelizing the original features of training samples. While, these metric learning algorithms have poor generalization ability and low accuracy of testing results.

In recent years, the deep learning methods have been wildly applicated in computer vision tasks and achieved very good results. Li et al.13 established a person re-recognition network model (FPNN) based on ‘Siamese’ network structure, which is an early and very representative network for person re-recognition, but its recognition accuracy is very low. Ahmed et al.14 carried out an improvement on the FPNN person re-recognition network model for discriminative feature extraction. Ustinova et al.15 built a local feature-based network model using the characteristics of body structure. Liu et al.16 introduced adversarial generative network (GAN) and attention mechanism to build a network model with the ability to fight against small local interference. However, deep networks need a large amount of data for model training and has limited generalization ability among different surveillance networks, requiring a large amount of data labeled for each surveillance network for model optimization. Deep network-based person re-identification methods have major drawbacks in various detection, recognition and prediction applications.

3.

METRIC LEARNING METHOD FOR PERSON RE-IDENTIFICATION

In this problem, the similarity between pedestrian images is measured by learning a Mahalanobis distance model. Given two sample sets: A = {x1,x2,…,xi,…,xN}, where xi, indicates the appearance feature of the sample of the i-th pedestrian under camera A; B = {z1, z2, ‧‧‧, zj, ‧‧‧, zN}, where zj denotes indicates the appearance feature of the sample of the j-th pedestrian under camera B. The mathematical expression of the model is as follows

00218_PSISDG12506_125066G_page_2_1.jpg

where M ≥ 0 is a positive semi-definite matrix. xi, zj represent the feature vectors of two samples from different camera view, respectively. Moreover, M is the metric matrix of distance model. In order to learn a discriminative metric subspace, Liao et al.11 first learn a set of projection features and then learn a Mahalanobis distance by KISSME5 method.

00218_PSISDG12506_125066G_page_2_2.jpg

where xizj represents the difference feature vectors. The model of XQDA11 method is established as follows:

00218_PSISDG12506_125066G_page_2_3.jpg

According to the related properties of the Rayleigh Entropy, the largest value of the function in equation (3) above is the largest eigenvalue of matrix 00218_PSISDG12506_125066G_page_2_4.jpg and the solution is corresponding eigenvector w1. We use the first r eigenvectors to obtain a set of projection features by W = (w1, w2, ‧‧‧, wr).

4.

IMPROVEMENT OF METRIC LEARNING ALGORITHM BASED ON SIMILAR SAMPLE CONSTRAINTS

In this task, there is a serious overfitting problem. The projection subspace realizes the separation of positive and negative samples by finding a series of feature transformation patterns. To overcome the over-fitting problem, we use the reference set information to establish a pairwise similarity distance metric learning model against over-fitting problem for complex scenes.

00218_PSISDG12506_125066G_page_3_1.jpg and 00218_PSISDG12506_125066G_page_3_2.jpg, denote the sample sets under different camera views of the test data, respectively. Then, by using the metric model introduced above, we find the most similar k samples in the reference set for each individual. These k samples are used to calculate the mean difference feature vector to estimate the center of positive pairs’ difference vector:

00218_PSISDG12506_125066G_page_3_3.jpg

where qij = 0 or 1. qij =1 denotes 00218_PSISDG12506_125066G_page_3_4.jpg is similar sample of k nearest neighbors. qij =0 denotes 00218_PSISDG12506_125066G_page_3_5.jpg is dissimilar sample. αij denotes the sample’s weight over 00218_PSISDG12506_125066G_page_3_6.jpg, defined as follows:

00218_PSISDG12506_125066G_page_3_7.jpg

Then, we calculate the center of the training data positive sample difference vector: 00218_PSISDG12506_125066G_page_3_8.jpg.

The distribution constraint of positive samples’ difference feature vectors between test data and training data is defined as follow,

00218_PSISDG12506_125066G_page_3_9.jpg

where S denotes the distance between distribution of positive sample pairs in the test data and in the training data. This distance is used to limit the distance between the identified pairs and the positive pairs.

Define C = B as a reference set, where B is the data collected under camera B in the training data. Then, the similarity distance between each individual 00218_PSISDG12506_125066G_page_3_10.jpg in the test data and each sample in the reference set C is calculated. Samples from Reference set C and test instances from A′ will form negative sample pairs. We define the distance between the similar sample pairs in the reference set and the positive sample pairs as follows:

00218_PSISDG12506_125066G_page_3_11.jpg

where 00218_PSISDG12506_125066G_page_3_12.jpg is the similar sample pair difference vector distribution center of test individuals in the reference set. pij denotes the matching relationship between xs and zt. 00218_PSISDG12506_125066G_page_3_13.jpg, zjC. pij =1 denotes 00218_PSISDG12506_125066G_page_3_14.jpg is a sample that is similar to less than (negative sample), and pst =0 denotes (xs,zt) is a pair of similar samples of the k′ nearest neighbors. We use the metric model in Section 3 to calculate the similarity between samples of different camera views. The samples with the highest similarity are treated as similar samples.

00218_PSISDG12506_125066G_page_3_15.jpg

where D′ indicates the distance from the center of similar sample pairs in the reference set to the center of the theoretical distribution of positive samples. The distance in equation (6) is introduced as a constraint to the learning of the projection subspace in the second part, so as to improve the generalization ability of the metric model to the test data. The mathematical expression of improved metric learning model is as introduced in equation (9):

00218_PSISDG12506_125066G_page_4_1.jpg

The improved metric learning model is transformed to solve the following problem,

00218_PSISDG12506_125066G_page_4_2.jpg

The improved metric model is obtained by solving the eigenvalue problem in equation (10). It is worth noting that during the computation, we train the metric model separately for each test individual. Because the test individuals are individual-specific, the characteristics of each test individual vary.

5.

EXPERIMENT

To validate the performance of our model, the identification rates on VIPeR3 dataset is summarized. For the evaluation of the accuracy of algorithms for this task, the widely used cumulative accuracy curve (CMC)5 is used to evaluate the performance of our model.

The identification results are compared to some first-class methods to further prove the improvement effect of our novel model. The results are displayed in Figure 1 and Table 1. The identification results of rank-1, 5, 10, and 20 are listed in Table 1. According to the results in this table, it can be clearly seen that the algorithm in this paper has achieved the best recognition accuracy on all indicators. Especially in the recognition accuracy of rank-1, compared to the best comparison model, OSNet, rank-1 accuracy is improved by 3.4%. The recognition accuracy of rank-1, rank-5, rank-10 and rank-20 is improved to 71.4%, 93.6%, 98.2% and 99.8%, respectively. Compared to the baseline comparison algorithm XQDA, the recognition accuracy of the rank-1 of our model is reised by 30.8%. The recognition accuracy improvement effect is very significant.

Figure 1.

Comparison of identification accuracy of different methods on VIPeR dataset.

00218_PSISDG12506_125066G_page_4_3.jpg

Table 1.

Comparison results on VIPeR dataset.

Methodsr=1r=5r=10r=20Methodsr=1r=5r=10r=20
KISSME1018.747.261.775.6Improved deep1434.863.57580
SCML1740.6---SCIR2235.8---
XQDA-LOMO1140.668.380.591.8TCP2347.874.784.891.1
NFST-LOMO1842.371.582.992.1MCK-CCA2447.2-87.394.7
LSSCDL-LOMO1942.7-84.391.9KEPLER2542.4-82.490.7
MLAPG-LOMO2040.769.982.392.4HRNet2648.773.481.7-
IPMLLSL2146.569.380.786.5HydraPlus-Net2756.6---
Semi-supervised63.487.594.698.7OSNet2868.0---
Ours-LOMO71.493.698.299.8     

6.

CONCLUSION

Overfitting is a common phenomenon that affects the accuracy of algorithms in machine learning tasks. We take person re-identification problem as the object and studies similarity measurement model against the overfitting problem. Person re-identification technology is based on the similarity measurement of pedestrian image pairs, which is more complicated than the traditional data point classification problem and cannot be constrained by the direct sample population distribution distance. In this paper, similar samples of individuals are tested to form pseudo positive pairs. The optimization of the metric model is guided by constraining the difference between the test individual’s positive pairs and the theoretical center of probability distribution. Finally, through extensive comparative experiments and parameter analysis, the effectiveness of our model is verified.

REFERENCE

[1] 

Liu, X., Tao, D., Song, M., Zhang L., Bu, J. and Chen, C., “Learning to track multiple targets,” IEEE Transactions on Neural Networks & Learning Systems, 26 (5), 1060 (2015). https://doi.org/10.1109/TNNLS.2014.2333751 Google Scholar

[2] 

Gray, D. and Tao, H., “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European Conference on Computer Vision, 262 –275 (2008). Google Scholar

[3] 

Du, Y., Ai, H. and Lao, S., “Evaluation of color spaces for person re-identification,” in IEEE Conference on Pattern Recognition, Tsukuba, 1371 –1374 (2012). Google Scholar

[4] 

Prosser, B., Zheng, W., Gong, S. and Xiang, T., “Person re-identification by support vector ranking,” in British Machine Vision Conference, 231 –242 (2010). Google Scholar

[5] 

Hirzer, M., Roth, P. M., Kostinger, M. and Bischof, H., “Relaxed pairwise learned metric for person re-identification,” in European Conference on Computer Vision, 780 –793 (2012). Google Scholar

[6] 

Weinberger, K. Q. and Saul, L. K., “Distance metric learning for large margin nearest neighbor classification,” The Journal of Machine Learning Research, 10 (1), 207 –244 (2009). Google Scholar

[7] 

Davis, J. V., Kulis, B., Jain, P., Sra, S. and Dhillon, I. S., “Information-theoretic metric learning,” in International Conference on Machine Learning, 209 –216 (2007). Google Scholar

[8] 

Guillaumin, M., Verbeek, J. and Schmid, C., “Is that you? Metric learning approaches for face identification,” in International Conference on Computer Vision, 498 –505 (2009). Google Scholar

[9] 

Zheng, W. S., Gong, S. and Xiang, T., “Person re-identification by probabilistic relative distance comparison,” in Conference Computer Vision. Pattern Recognition, 649 –656 (2011). Google Scholar

[10] 

Köstinger, M., Hirzer, M., Wohlhart, P., Roth, P. and Bischof, H., “Large scale metric learning from equivalence constraints,” in IEEE Conference on Computer Vision and Pattern Recognition, 2288 –2295 (2012). Google Scholar

[11] 

Liao, S., Hu, Y., Zhu, X. and Li, S. Z., “Person re-identification by local maximal occurrence representation and metric learning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2197 –2206 (2015). Google Scholar

[12] 

Zhao, C., Chen, Y., Wang, X., Wong, W. K., Miao, D. and Lei, J., “Kernelized random KISS metric learning for person re-identification,” Neurocomputing, 275 403 –417 (2018). https://doi.org/10.1016/j.neucom.2017.08.064 Google Scholar

[13] 

Li, W., Zhao, R., Xiao, T. and Wang, X., “Deepreid: Deep filter pairing neural network for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition, 152 –159 (2014). Google Scholar

[14] 

Ahmed, E., Jones, M. and Marks, T. K., “An improved deep learning architecture for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition, 3908 –3916 (2015). Google Scholar

[15] 

Ustinova, E., Ganin, Y. and Lempitsky, V., “Multi-region bilinear convolutional neural networks for person re-identification,” in International Conference on Advanced Video and Signal Based Surveillance, 1 –6 (2017). Google Scholar

[16] 

Liu, A., Liu, X., Fan, J., et al., “Perceptual-sensitive GAN for generating adversarial patches,” in Proceedings of the AAAI Conference on Artificial Intelligence, 1028 –1035 (2019). Google Scholar

[17] 

Feng, Y., Yuan, Y. and Lu, X., “Person reidentification via unsupervised cross-view metric learning,” IEEE Transactions on Cybernetics, (2019). Google Scholar

[18] 

Zhang, L., Xiang, T. and Gong, S., “Learning a discriminative null space for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition, 1239 –1248 (2007). Google Scholar

[19] 

Zhang, Y., Li, B., Lu, H., Irie, A. and Ruan, X., “Sample-specific svm learning for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition, 1278 –1287 (2016). Google Scholar

[20] 

Liao, S. and Li, S. Z., “Efficient psd constrained asymmetric metric learning for person re-identification,” in IEEE Conference on Computer Vision and Pattern Recognition, 3685 –3693 (2015). Google Scholar

[21] 

Zhao, Z., Zhao, B. and Su, F., “Person re-identification via integrating patch-based metric learning and local salience learning,” Pattern Recognition, 75 90 –98 (2017). https://doi.org/10.1016/j.patcog.2017.03.023 Google Scholar

[22] 

Lin, T. Y., Roychowdhury, A. and Maji, S., “Bilinear convolutional neural networks for fine-grained visual recognition,” IEEE Transactions on Pattern Analysis & Machine Intelligence, 40 (6), 1309 –1322 (2017). https://doi.org/10.1109/TPAMI.2017.2723400 Google Scholar

[23] 

Cheng, D., Gong, Y., Zhou, S., Wang, J. and Zheng, N., “Person re-identification by multi-channel parts-based cnn with improved triplet loss function,” in IEEE Conference on Computer Vision and Pattern Recognition, 1335 –1344 (2016). Google Scholar

[24] 

Lisanti, G., Karaman, S. and Masi, I., “Multichannel-kernel canonical correlation analysis for cross-view person reidentification,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 13 (2), 13 (2017). Google Scholar

[25] 

Zhao, Z., Zhao, B. and Su, F., “Person re-identification via integrating patch-based metric learning and local salience learning,” Pattern Recognition, 75 90 –98 (2017). https://doi.org/10.1016/j.patcog.2017.03.023 Google Scholar

[26] 

Zhang, G., Ge, Y., Dong, Z., Wang, H., Zheng, Y. and Chen, S., “Deep high-resolution representation learning for cross-resolution person re-identification,” arXiv preprint arXiv: 2105.11722, (2021). Google Scholar

[27] 

Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J. and Wang, X., “Hydraplus-net: Attentive deep features for pedestrian analysis,” in IEEE International Conference on Computer Vision, 350 –359 (2017). Google Scholar

[28] 

Zhou, K., Yang, Y., Cavallaro, A. and Xiang, T., “Learning generalisable omni-scale representations for person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 44 (9), 5056 –5069 (2021). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Chenhao Si "Reference set based metric learning method for person re-identification against overfitting problem", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125066G (28 December 2022); https://doi.org/10.1117/12.2662666
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Machine learning

Computer vision technology

Data modeling

Pattern recognition

Overfitting

Education and training

Statistical modeling

Back to Top