Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115813
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorSchool of Optometry-
dc.contributorResearch Centre for SHARP Vision-
dc.creatorXu, P-
dc.creatorWu, Y-
dc.creatorJin, K-
dc.creatorChen, X-
dc.creatorHe, M-
dc.creatorShi, D-
dc.date.accessioned2025-11-04T03:15:50Z-
dc.date.available2025-11-04T03:15:50Z-
dc.identifier.urihttp://hdl.handle.net/10397/115813-
dc.language.isoenen_US
dc.publisherElsevier Inc.en_US
dc.rights© 2025 The Author(s). Published by Elsevier Inc. on behalf of Zhejiang University Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).en_US
dc.rightsThe following publication Xu, P., Wu, Y., Jin, K., Chen, X., He, M., & Shi, D. (2025). DeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoning. Advances in Ophthalmology Practice and Research, 5(3), 189–195 is available at https://doi.org/10.1016/j.aopr.2025.05.001.en_US
dc.subjectClinical decision supporten_US
dc.subjectDeepSeeken_US
dc.subjectGeminien_US
dc.subjectLarge language modelsen_US
dc.subjectOpenAIen_US
dc.subjectOphthalmology professional examinationen_US
dc.subjectReasoning abilityen_US
dc.titleDeepSeek-R1 outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in bilingual complex ophthalmology reasoningen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage189-
dc.identifier.epage195-
dc.identifier.volume5-
dc.identifier.issue3-
dc.identifier.doi10.1016/j.aopr.2025.05.001-
dcterms.abstractPurpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three recently released large language models (LLMs) in bilingual complex ophthalmology cases.-
dcterms.abstractMethods: A total of 130 multiple-choice questions (MCQs) related to diagnosis (n ​= ​39) and management (n ​= ​91) were collected from the Chinese ophthalmology senior professional title examination and categorized into six topics. These MCQs were translated into English. Responses from DeepSeek-R1, Gemini 2.0 Pro, OpenAI o1 and o3-mini were generated under default configurations between February 15 and February 20, 2025. Accuracy was calculated as the proportion of correctly answered questions, with omissions and extra answers considered incorrect. Reasoning ability was evaluated through analyzing reasoning logic and the causes of reasoning errors.-
dcterms.abstractResults: DeepSeek-R1 demonstrated the highest overall accuracy, achieving 0.862 in Chinese MCQs and 0.808 in English MCQs. Gemini 2.0 Pro, OpenAI o1, and OpenAI o3-mini attained accuracies of 0.715, 0.685, and 0.692 in Chinese MCQs (all P ​<0.001 compared with DeepSeek-R1), and 0.746 (P ​= ​0.115), 0.723 (P ​= ​0.027), and 0.577 (P ​<0.001) in English MCQs, respectively. DeepSeek-R1 achieved the highest accuracy across five topics in both Chinese and English MCQs. It also excelled in management questions conducted in Chinese (all P ​<0.05). Reasoning ability analysis showed that the four LLMs shared similar reasoning logic. Ignoring key positive history, ignoring key positive signs, misinterpretation of medical data, and overuse of non–first-line interventions were the most common causes of reasoning errors.-
dcterms.abstractConclusions: DeepSeek-R1 demonstrated superior performance in bilingual complex ophthalmology reasoning tasks than three state-of-the-art LLMs. These findings highlight the potential of advanced LLMs to assist in clinical decision-making and suggest a framework for evaluating reasoning capabilities.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationAdvances in ophthalmology practice and research, Aug.-Sept 2025, v. 5, no. 3, p. 189-195-
dcterms.isPartOfAdvances in ophthalmology practice and research-
dcterms.issued2025-08-
dc.identifier.scopus2-s2.0-105009348145-
dc.identifier.eissn2667-3762-
dc.description.validate202511 bcch-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Scopus/WOSen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThis study was supported by the Global STEM Professorship Scheme (P0046113) and the Start-up Fund for RAPs under the Strategic Hiring Scheme (P0048623) from HKSAR.en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
1-s2.0-S2667376225000290-main.pdf1.55 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.