Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/110737
Title: | Slit lamp report generation and question answering : development and validation of a multimodal transformer model with large language model integration | Authors: | Zhao, Z Zhang, W Chen, X Song, F Gunasegaram, J Huang, W Shi, D He, M Liu, N |
Issue Date: | 2024 | Source: | Journal of medical Internet research, 2024, v. 26, e54047 | Abstract: | Background: Large language models have shown remarkable efficacy in various medical research and clinical applications. However, their skills in medical image recognition and subsequent report generation or question answering (QA) remain limited. Objective: We aim to finetune a multimodal, transformer-based model for generating medical reports from slit lamp images and develop a QA system using Llama2. We term this entire process slit lamp–GPT. Methods: Our research used a dataset of 25,051 slit lamp images from 3409 participants, paired with their corresponding physician-created medical reports. We used these data, split into training, validation, and test sets, to finetune the Bootstrapping Language-Image Pre-training framework toward report generation. The generated text reports and human-posed questions were then input into Llama2 for subsequent QA. We evaluated performance using qualitative metrics (including BLEU [bilingual evaluation understudy], CIDEr [consensus-based image description evaluation], ROUGE-L [Recall-Oriented Understudy for Gisting Evaluation—Longest Common Subsequence], SPICE [Semantic Propositional Image Caption Evaluation], accuracy, sensitivity, specificity, precision, and F1-score) and the subjective assessments of two experienced ophthalmologists on a 1-3 scale (1 referring to high quality). Results: We identified 50 conditions related to diseases or postoperative complications through keyword matching in initial reports. The refined slit lamp–GPT model demonstrated BLEU scores (1-4) of 0.67, 0.66, 0.65, and 0.65, respectively, with a CIDEr score of 3.24, a ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score of 0.61, and a Semantic Propositional Image Caption Evaluation score of 0.37. The most frequently identified conditions were cataracts (22.95%), age-related cataracts (22.03%), and conjunctival concretion (13.13%). Disease classification metrics demonstrated an overall accuracy of 0.82 and an F1-score of 0.64, with high accuracies (≥0.9) observed for intraocular lens, conjunctivitis, and chronic conjunctivitis, and high F1-scores (≥0.9) observed for cataract and age-related cataract. For both report generation and QA components, the two evaluating ophthalmologists reached substantial agreement, with κ scores between 0.71 and 0.84. In assessing 100 generated reports, they awarded scores of 1.36 for both completeness and correctness; 64% (64/100) were considered “entirely good,” and 93% (93/100) were “acceptable.” In the evaluation of 300 generated answers to questions, the scores were 1.33 for completeness, 1.14 for correctness, and 1.15 for possible harm, with 66.3% (199/300) rated as “entirely good” and 91.3% (274/300) as “acceptable.” Conclusions: This study introduces the slit lamp–GPT model for report generation and subsequent QA, highlighting the potential of large language models to assist ophthalmologists and patients. ©Ziwei Zhao, Weiyi Zhang, Xiaolan Chen, Fan Song, James Gunasegaram, Wenyong Huang, Danli Shi, Mingguang He, Na Liu. |
Keywords: | Large language model Medical report generation Question answering Slit lamp |
Publisher: | JMIR Publications, Inc. | Journal: | Journal of medical Internet research | ISSN: | 1439-4456 | EISSN: | 1438-8871 | DOI: | 10.2196/54047 | Rights: | ©Ziwei Zhao, Weiyi Zhang, Xiaolan Chen, Fan Song, James Gunasegaram, Wenyong Huang, Danli Shi, Mingguang He, Na Liu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.12.2024. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. The following publication Zhao, Z., Zhang, W., Chen, X., Song, F., Gunasegaram, J., Huang, W., Shi, D., He, M., & Liu, N. (2024). Slit Lamp Report Generation and Question Answering: Development and Validation of a Multimodal Transformer Model with Large Language Model Integration. J Med Internet Res, 26, e54047 is available at https://dx.doi.org/10.2196/54047. |
Appears in Collections: | Journal/Magazine Article |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
jmir-2024-1-e54047.pdf | 964.96 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.