Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/112832
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorSchool of Optometryen_US
dc.contributorResearch Centre for SHARP Visionen_US
dc.creatorChen, Xen_US
dc.creatorXiang, Jen_US
dc.creatorLu, Sen_US
dc.creatorLiu, Yen_US
dc.creatorHe, Men_US
dc.creatorShi, Den_US
dc.date.accessioned2025-05-09T02:58:45Z-
dc.date.available2025-05-09T02:58:45Z-
dc.identifier.urihttp://hdl.handle.net/10397/112832-
dc.language.isoenen_US
dc.publisherElsevier BVen_US
dc.rights© 2025 The Author(s). Published by Elsevier B.V. on behalf of Chinese Medical Association. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)en_US
dc.rightsThe following publication Chen, X., Xiang, J., Lu, S., Liu, Y., He, M., & Shi, D. (2025). Evaluating large language models and agents in healthcare: key challenges in clinical applications. Intelligent Medicine, 5(2), 151-163 is available at https://doi.org/10.1016/j.imed.2025.03.002.en_US
dc.subjectEvaluationen_US
dc.subjectGenerative pre-trained transformeren_US
dc.subjectHallucinationen_US
dc.subjectLarge language modelen_US
dc.subjectMedical agenten_US
dc.subjectReasoningen_US
dc.titleEvaluating large language models and agents in healthcare : key challenges in clinical applicationsen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage151en_US
dc.identifier.epage163en_US
dc.identifier.volume5en_US
dc.identifier.issue2en_US
dc.identifier.doi10.1016/j.imed.2025.03.002en_US
dcterms.abstractLarge language models (LLMs) have emerged as transformative tools with significant potential across healthcare and medicine. In clinical settings, they hold promises for tasks ranging from clinical decision support to patient education. Advances in LLM agents further broaden their utility by enabling multimodal processing and multitask handling in complex clinical workflows. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the high-risk nature of healthcare and the complexity of medical data. This paper provides a comprehensive overview of current evaluation practices for LLMs and LLM agents in medicine. We contributed 3 main aspects: First, we summarized data sources used in evaluations, including existing medical resources and manually designed clinical questions, offering a basis for LLM evaluation in medical settings. Second, we analyzed key medical task scenarios: closed-ended tasks, open-ended tasks, image processing tasks, and real-world multitask scenarios involving LLM agents, thereby offering guidance for further research across different medical applications. Third, we compared evaluation methods and dimensions, covering both automated metrics and human expert assessments, while addressing traditional accuracy measures alongside agent-specific dimensions, such as tool usage and reasoning capabilities. Finally, we identified key challenges and opportunities in this evolving field, emphasizing the need for continued research and interdisciplinary collaboration between healthcare professionals and computer scientists to ensure safe, ethical, and effective deployment of LLMs in clinical practice.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIntelligent medicine, May 2025, v. 5, no. 2, p. 151-163en_US
dcterms.isPartOfIntelligent medicineen_US
dcterms.issued2025-05-
dc.identifier.eissn2667-1026en_US
dc.description.validate202505 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera3583b-
dc.identifier.SubFormID50403-
dc.description.fundingSourceSelf-fundeden_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
1-s2.0-S2667102625000294-main.pdf3.15 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.