Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/110738
| Title: | EyeGPT for patient inquiries and medical education : development and validation of an ophthalmology large language model | Authors: | Chen, X Zhao, Z Zhang, W Xu, P Wu, Y Xu, M Gao, L Li, Y Shang, X Shi, D He, M |
Issue Date: | 2024 | Source: | Journal of medical Internet research, 2024, v. 26, e60063 | Abstract: | Background: Large language models (LLMs) have the potential to enhance clinical flow and improve medical education, but they encounter challenges related to specialized knowledge in ophthalmology. Objective: This study aims to enhance ophthalmic knowledge by refining a general LLM into an ophthalmology-specialized assistant for patient inquiries and medical education. Methods: We transformed Llama2 into an ophthalmology-specialized LLM, termed EyeGPT, through the following 3 strategies: prompt engineering for role-playing, fine-tuning with publicly available data sets filtered for eye-specific terminology (83,919 samples), and retrieval-augmented generation leveraging a medical database and 14 ophthalmology textbooks. The efficacy of various EyeGPT variants was evaluated by 4 board-certified ophthalmologists through comprehensive use of 120 diverse category questions in both simple and complex question-answering scenarios. The performance of the best EyeGPT model was then compared with that of the unassisted human physician group and the EyeGPT+human group. We proposed 4 metrics for assessment: accuracy, understandability, trustworthiness, and empathy. The proportion of hallucinations was also reported. Results: The best fine-tuned model significantly outperformed the original Llama2 model at providing informed advice (mean 9.30, SD 4.42 vs mean 13.79, SD 5.70; P<.001) and mitigating hallucinations (97/120, 80.8% vs 53/120, 44.2%, P<.001). Incorporating information retrieval from reliable sources, particularly ophthalmology textbooks, further improved the model's response compared with solely the best fine-tuned model (mean 13.08, SD 5.43 vs mean 15.14, SD 4.64; P=.001) and reduced hallucinations (71/120, 59.2% vs 57/120, 47.4%, P=.02). Subgroup analysis revealed that EyeGPT showed robustness across common diseases, with consistent performance across different users and domains. Among the variants, the model integrating fine-tuning and book retrieval ranked highest, closely followed by the combination of fine-tuning and the manual database, standalone fine-tuning, and pure role-playing methods. EyeGPT demonstrated competitive capabilities in understandability and empathy when compared with human ophthalmologists. With the assistance of EyeGPT, the performance of the ophthalmologist was notably enhanced. Conclusions: We pioneered and introduced EyeGPT by refining a general domain LLM and conducted a comprehensive comparison and evaluation of different strategies to develop an ophthalmology-specific assistant. Our results highlight EyeGPT’s potential to assist ophthalmologists and patients in medical settings. |
Keywords: | EyeGPT Generative AI Generative artificial intelligence Generative pretrained transformer Large language model Medical assistant Ophthalmology Retrieval-Augmented generation |
Publisher: | JMIR Publications, Inc. | Journal: | Journal of medical Internet research | ISSN: | 1439-4456 | EISSN: | 1438-8871 | DOI: | 10.2196/60063 | Rights: | ©Xiaolan Chen, Ziwei Zhao, Weiyi Zhang, Pusheng Xu, Yue Wu, Mingpu Xu, Le Gao, Yinwen Li, Xianwen Shang, Danli Shi, Mingguang He. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 11.12.2024. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. The following publication Chen, X., Zhao, Z., Zhang, W., Xu, P., Wu, Y., Xu, M., Gao, L., Li, Y., Shang, X., Shi, D., & He, M. (2024). EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model. J Med Internet Res, 26, e60063 is available at https://dx.doi.org/10.2196/60063. |
| Appears in Collections: | Journal/Magazine Article |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| jmir-2024-1-e60063.pdf | 648.38 kB | Adobe PDF | View/Open |
Page views
14
Citations as of Apr 14, 2025
Downloads
8
Citations as of Apr 14, 2025
SCOPUSTM
Citations
18
Citations as of Dec 19, 2025
WEB OF SCIENCETM
Citations
18
Citations as of Dec 18, 2025
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



