Evaluating large language models and agents in healthcare : key challenges in clinical applications

Chen, X; Xiang, J; Lu, S; Liu, Y; He, M; Shi, D

doi:10.1016/j.imed.2025.03.002

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/112832

DC Field	Value	Language
dc.contributor	School of Optometry	en_US
dc.contributor	Research Centre for SHARP Vision	en_US
dc.creator	Chen, X	en_US
dc.creator	Xiang, J	en_US
dc.creator	Lu, S	en_US
dc.creator	Liu, Y	en_US
dc.creator	He, M	en_US
dc.creator	Shi, D	en_US
dc.date.accessioned	2025-05-09T02:58:45Z	-
dc.date.available	2025-05-09T02:58:45Z	-
dc.identifier.uri	http://hdl.handle.net/10397/112832	-
dc.language.iso	en	en_US
dc.publisher	Elsevier BV	en_US
dc.rights	© 2025 The Author(s). Published by Elsevier B.V. on behalf of Chinese Medical Association. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)	en_US
dc.rights	The following publication Chen, X., Xiang, J., Lu, S., Liu, Y., He, M., & Shi, D. (2025). Evaluating large language models and agents in healthcare: key challenges in clinical applications. Intelligent Medicine, 5(2), 151-163 is available at https://doi.org/10.1016/j.imed.2025.03.002.	en_US
dc.subject	Evaluation	en_US
dc.subject	Generative pre-trained transformer	en_US
dc.subject	Hallucination	en_US
dc.subject	Large language model	en_US
dc.subject	Medical agent	en_US
dc.subject	Reasoning	en_US
dc.title	Evaluating large language models and agents in healthcare : key challenges in clinical applications	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	151	en_US
dc.identifier.epage	163	en_US
dc.identifier.volume	5	en_US
dc.identifier.issue	2	en_US
dc.identifier.doi	10.1016/j.imed.2025.03.002	en_US
dcterms.abstract	Large language models (LLMs) have emerged as transformative tools with significant potential across healthcare and medicine. In clinical settings, they hold promises for tasks ranging from clinical decision support to patient education. Advances in LLM agents further broaden their utility by enabling multimodal processing and multitask handling in complex clinical workflows. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the high-risk nature of healthcare and the complexity of medical data. This paper provides a comprehensive overview of current evaluation practices for LLMs and LLM agents in medicine. We contributed 3 main aspects: First, we summarized data sources used in evaluations, including existing medical resources and manually designed clinical questions, offering a basis for LLM evaluation in medical settings. Second, we analyzed key medical task scenarios: closed-ended tasks, open-ended tasks, image processing tasks, and real-world multitask scenarios involving LLM agents, thereby offering guidance for further research across different medical applications. Third, we compared evaluation methods and dimensions, covering both automated metrics and human expert assessments, while addressing traditional accuracy measures alongside agent-specific dimensions, such as tool usage and reasoning capabilities. Finally, we identified key challenges and opportunities in this evolving field, emphasizing the need for continued research and interdisciplinary collaboration between healthcare professionals and computer scientists to ensure safe, ethical, and effective deployment of LLMs in clinical practice.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Intelligent medicine, May 2025, v. 5, no. 2, p. 151-163	en_US
dcterms.isPartOf	Intelligent medicine	en_US
dcterms.issued	2025-05	-
dc.identifier.eissn	2667-1026	en_US
dc.description.validate	202505 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a3583b	-
dc.identifier.SubFormID	50403	-
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
1-s2.0-S2667102625000294-main.pdf		3.15 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM