Paper
15 August 2023 Pixel-level feature enhancement and weighted fusion for visual relationship detection
Jiang-tao Li
Author Affiliations +
Proceedings Volume 12719, Second International Conference on Electronic Information Technology (EIT 2023); 127191F (2023) https://doi.org/10.1117/12.2685674
Event: Second International Conference on Electronic Information Technology (EIT 2023), 2023, Wuhan, China
Abstract
In the traditional visual relationship detection model, the entity representation of visual features tends to focus on coarsegrained and ignore fine-grained features. Moreover, the entity representation of spatial features does not fully reflect the prominent role of relative position. In some specific cases, it is impossible to generate a unique spatial feature representation vector. The final feature fusion did not take into account the characteristic that visual features are primary, while semantic and spatial features are secondary. In order to address the above issues, we propose a visual relationship detection model with pixel-level feature enhancement and weighted fusion. Specifically, we embed a fine-grained information block to capture pixel-level context information in the feature map, providing richer visual features for the prediction of relationships. We adopt a coordinate encoding method to encode the respective and relative positions of entity pairs to bounding boxes to obtain more accurate spatial feature representations. We construct a feature fusion method based on feature weighting to obtain a more improved fusion vector. We conducted extensive experiments on the mainstream visual relationship detection dataset. The results show that our proposed model performs better.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jiang-tao Li "Pixel-level feature enhancement and weighted fusion for visual relationship detection", Proc. SPIE 12719, Second International Conference on Electronic Information Technology (EIT 2023), 127191F (15 August 2023); https://doi.org/10.1117/12.2685674
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Feature fusion

Object detection

Feature extraction

Back to Top