A multi-modal model based on transformers for medical visual question answering

Mingchun Huang; Ming Xu; Fuhuang Liu; Liyan Chen

doi:10.1117/12.2671434

28 April 2023 A multi-modal model based on transformers for medical visual question answering

Mingchun Huang, Ming Xu, Fuhuang Liu, Liyan Chen

Proceedings Volume 12610, Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022); 126101C (2023) https://doi.org/10.1117/12.2671434
Event: Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), 2022, Wuhan, China

Abstract

Compared with the general Visual Question Answering (VQA), Medical VQA is more challenging. Medical images contain more complex information than general images. Aiming at this point, we propose the IIF module that can improve the model's ability to obtain visual feature. In addition, we design QAM to help the model analyze the question better. On the VQA-RAD dataset, the accuracy of our model improved to 66.4% on the opened-ended questions and 80.1% on the closed-ended questions, outperforming other relevant models. The results on the VQA-MED 2019 dataset also verify the effectiveness of our model.

Citation Download Citation

Mingchun Huang, Ming Xu, Fuhuang Liu, and Liyan Chen "A multi-modal model based on transformers for medical visual question answering", Proc. SPIE 12610, Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), 126101C (28 April 2023); https://doi.org/10.1117/12.2671434

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available