Unveiling the clinical incapabilities : a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis

Xu, P; Chen, X; Zhao, Z; Shi, D

doi:10.1136/bjo-2023-325054

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107544

DC Field	Value	Language
dc.contributor	School of Optometry	en_US
dc.contributor	Research Centre for SHARP Vision	en_US
dc.creator	Xu, P	en_US
dc.creator	Chen, X	en_US
dc.creator	Zhao, Z	en_US
dc.creator	Shi, D	en_US
dc.date.accessioned	2024-07-03T04:31:38Z	-
dc.date.available	2024-07-03T04:31:38Z	-
dc.identifier.issn	0007-1161	en_US
dc.identifier.uri	http://hdl.handle.net/10397/107544	-
dc.language.iso	en	en_US
dc.publisher	BMJ Group	en_US
dc.rights	© Author(s) (or their employer(s)) 2024. No commercial re- use. See rights and permissions. Published by BMJ.	en_US
dc.rights	This article has been accepted for publication in British Journal of Ophthalmology, 2024 following peer review, and the Version of Record can be accessed online at https://doi.org/10.1136/bjo-2023-325054.	en_US
dc.title	Unveiling the clinical incapabilities : a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	108	en_US
dc.identifier.issue	10	en_US
dc.identifier.doi	10.1136/bjo-2023-325054	en_US
dcterms.abstract	Purpose: To evaluate the capabilities and incapabilities of a GPT-4V(ision)-based chatbot in interpreting ocular multimodal images.	en_US
dcterms.abstract	Methods: We developed a digital ophthalmologist app using GPT-4V and evaluated its performance with a dataset (60 images, 60 ophthalmic conditions, 6 modalities) that included slit-lamp, scanning laser ophthalmoscopy, fundus photography of the posterior pole (FPP), optical coherence tomography, fundus fluorescein angiography and ocular ultrasound images. The chatbot was tested with ten open-ended questions per image, covering examination identification, lesion detection, diagnosis and decision support. The responses were manually assessed for accuracy, usability, safety and diagnosis repeatability. Auto-evaluation was performed using sentence similarity and GPT-4-based auto-evaluation.	en_US
dcterms.abstract	Results: Out of 600 responses, 30.6% were accurate, 21.5% were highly usable and 55.6% were deemed as no harm. GPT-4V performed best with slit-lamp images, with 42.0%, 38.5% and 68.5% of the responses being accurate, highly usable and no harm, respectively. However, its performance was weaker in FPP images, with only 13.7%, 3.7% and 38.5% in the same categories. GPT-4V correctly identified 95.6% of the imaging modalities and showed varying accuracies in lesion identification (25.6%), diagnosis (16.1%) and decision support (24.0%). The overall repeatability of GPT-4V in diagnosing ocular images was 63.3% (38/60). The overall sentence similarity between responses generated by GPT-4V and human answers is 55.5%, with Spearman correlations of 0.569 for accuracy and 0.576 for usability.	en_US
dcterms.abstract	Conclusion: GPT-4V currently is not yet suitable for clinical decision-making in ophthalmology. Our study serves as a benchmark for enhancing ophthalmic multimodal models.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	British journal of ophthalmology, Oct. 2024, v. 108, no. 10, 1384	en_US
dcterms.isPartOf	British journal of ophthalmology	en_US
dcterms.issued	2024-10	-
dc.identifier.pmid	38789133	-
dc.identifier.eissn	1468-2079	en_US
dc.identifier.artn	1384	en_US
dc.description.validate	202407 bcch	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a2925	-
dc.identifier.SubFormID	48779	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	Start-up Fund for RAPs under the Strategic Hiring Scheme	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Xu_Unveiling_Clinical_Incapabilities.pdf	Pre-Published version	2.03 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

39

Citations as of Apr 14, 2025

Downloads

57

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

21

Citations as of Sep 12, 2025

WEB OF SCIENCE^TM
Citations

29

Citations as of Dec 18, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM