Paper
28 April 2023 A multi-modal model based on transformers for medical visual question answering
Author Affiliations +
Proceedings Volume 12610, Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022); 126101C (2023) https://doi.org/10.1117/12.2671434
Event: Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), 2022, Wuhan, China
Abstract
Compared with the general Visual Question Answering (VQA), Medical VQA is more challenging. Medical images contain more complex information than general images. Aiming at this point, we propose the IIF module that can improve the model's ability to obtain visual feature. In addition, we design QAM to help the model analyze the question better. On the VQA-RAD dataset, the accuracy of our model improved to 66.4% on the opened-ended questions and 80.1% on the closed-ended questions, outperforming other relevant models. The results on the VQA-MED 2019 dataset also verify the effectiveness of our model.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Mingchun Huang, Ming Xu, Fuhuang Liu, and Liyan Chen "A multi-modal model based on transformers for medical visual question answering", Proc. SPIE 12610, Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), 126101C (28 April 2023); https://doi.org/10.1117/12.2671434
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Visualization

Feature extraction

Transformers

Feature fusion

Visual process modeling

Image fusion

Back to Top