DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning

Xu, P; Wu, Y; Jin, K; Chen, X; He, M; Shi, D

doi:10.1016/j.aopr.2025.05.001

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115813

DC Field	Value	Language
dc.contributor	School of Optometry	-
dc.contributor	Research Centre for SHARP Vision	-
dc.creator	Xu, P	-
dc.creator	Wu, Y	-
dc.creator	Jin, K	-
dc.creator	Chen, X	-
dc.creator	He, M	-
dc.creator	Shi, D	-
dc.date.accessioned	2025-11-04T03:15:50Z	-
dc.date.available	2025-11-04T03:15:50Z	-
dc.identifier.uri	http://hdl.handle.net/10397/115813	-
dc.language.iso	en	en_US
dc.publisher	Elsevier Inc.	en_US
dc.rights	© 2025 The Author(s). Published by Elsevier Inc. on behalf of Zhejiang University Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).	en_US
dc.rights	The following publication Xu, P., Wu, Y., Jin, K., Chen, X., He, M., & Shi, D. (2025). DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning. Advances in Ophthalmology Practice and Research, 5(3), 189–195 is available at https://doi.org/10.1016/j.aopr.2025.05.001.	en_US
dc.subject	Clinical decision support	en_US
dc.subject	DeepSeek	en_US
dc.subject	Gemini	en_US
dc.subject	Large language models	en_US
dc.subject	OpenAI	en_US
dc.subject	Ophthalmology professional examination	en_US
dc.subject	Reasoning ability	en_US
dc.title	DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	189	-
dc.identifier.epage	195	-
dc.identifier.volume	5	-
dc.identifier.issue	3	-
dc.identifier.doi	10.1016/j.aopr.2025.05.001	-
dcterms.abstract	Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three recently released large language models (LLMs) in bilingual complex ophthalmology cases.	-
dcterms.abstract	Methods: A total of 130 multiple-choice questions (MCQs) related to diagnosis (n = 39) and management (n = 91) were collected from the Chinese ophthalmology senior professional title examination and categorized into six topics. These MCQs were translated into English. Responses from DeepSeek-R1, Gemini 2.0 Pro, OpenAI o1 and o3-mini were generated under default configurations between February 15 and February 20, 2025. Accuracy was calculated as the proportion of correctly answered questions, with omissions and extra answers considered incorrect. Reasoning ability was evaluated through analyzing reasoning logic and the causes of reasoning errors.	-
dcterms.abstract	Results: DeepSeek-R1 demonstrated the highest overall accuracy, achieving 0.862 in Chinese MCQs and 0.808 in English MCQs. Gemini 2.0 Pro, OpenAI o1, and OpenAI o3-mini attained accuracies of 0.715, 0.685, and 0.692 in Chinese MCQs (all P <0.001 compared with DeepSeek-R1), and 0.746 (P = 0.115), 0.723 (P = 0.027), and 0.577 (P <0.001) in English MCQs, respectively. DeepSeek-R1 achieved the highest accuracy across five topics in both Chinese and English MCQs. It also excelled in management questions conducted in Chinese (all P <0.05). Reasoning ability analysis showed that the four LLMs shared similar reasoning logic. Ignoring key positive history, ignoring key positive signs, misinterpretation of medical data, and overuse of non–first-line interventions were the most common causes of reasoning errors.	-
dcterms.abstract	Conclusions: DeepSeek-R1 demonstrated superior performance in bilingual complex ophthalmology reasoning tasks than three state-of-the-art LLMs. These findings highlight the potential of advanced LLMs to assist in clinical decision-making and suggest a framework for evaluating reasoning capabilities.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Advances in ophthalmology practice and research, Aug.-Sept 2025, v. 5, no. 3, p. 189-195	-
dcterms.isPartOf	Advances in ophthalmology practice and research	-
dcterms.issued	2025-08	-
dc.identifier.scopus	2-s2.0-105009348145	-
dc.identifier.eissn	2667-3762	-
dc.description.validate	202511 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Scopus/WOS	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This study was supported by the Global STEM Professorship Scheme (P0046113) and the Start-up Fund for RAPs under the Strategic Hiring Scheme (P0048623) from HKSAR.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
1-s2.0-S2667376225000290-main.pdf		1.55 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM