14 March 2024 Learning discriminative common alignments for cross-modal retrieval
Hui Liu, Xiao-Ping Chen, Rui Hong, Yan Zhou, Tian-Cai Wan, Tai-Li Bai
Author Affiliations +
Abstract

Cross-modal retrieval aims to find alignment relationships between different modalities and then compute the semantic similarities used for ranking. Because of the data distribution difference and inherent heterogeneity gap between modalities, a classic solution is to learn common representations in the common space, which could preserve the discrimination among the samples from different categories and alleviate the cross-modal discrepancy. To achieve this, we propose a method, termed LDCA, to learn discriminative common alignments based on the modal representations. LDCA utilizes a modality invariance loss that pushes away the hardest negative sample to further reduce the cross-modal discrepancy at the feature level. In addition, LDCA seeks alignments in the label space to improve the intra-modal discrimination by an effective cross-modal label loss. Extensive experiments are conducted on five widely used cross-modal datasets to evaluate the proposed LDCA. The integral experimental results prove the method’s superiority, and the comprehensive analyses verify the effectiveness of the method.

© 2024 SPIE and IS&T
Hui Liu, Xiao-Ping Chen, Rui Hong, Yan Zhou, Tian-Cai Wan, and Tai-Li Bai "Learning discriminative common alignments for cross-modal retrieval," Journal of Electronic Imaging 33(2), 023022 (14 March 2024). https://doi.org/10.1117/1.JEI.33.2.023022
Received: 14 November 2023; Accepted: 11 January 2024; Published: 14 March 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Education and training

Semantics

Feature extraction

Visualization

Ablation

Design

Multimedia

Back to Top