Enhancement of the performance of large language models in diabetes education through retrieval-augmented generation : comparative study

Wang, D; Liang, J; Ye, J; Li, J; Li, J; Zhang, Q; Hu, Q; Pan, C; Wang, D; Liu, Z; Shi, W; Shi, D; Li, F; Qu, B; Zheng, Y

doi:10.2196/58041

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/110739

DC Field	Value	Language
dc.contributor	Research Centre for SHARP Vision	-
dc.creator	Wang, D	-
dc.creator	Liang, J	-
dc.creator	Ye, J	-
dc.creator	Li, J	-
dc.creator	Li, J	-
dc.creator	Zhang, Q	-
dc.creator	Hu, Q	-
dc.creator	Pan, C	-
dc.creator	Wang, D	-
dc.creator	Liu, Z	-
dc.creator	Shi, W	-
dc.creator	Shi, D	-
dc.creator	Li, F	-
dc.creator	Qu, B	-
dc.creator	Zheng, Y	-
dc.date.accessioned	2025-01-21T06:23:01Z	-
dc.date.available	2025-01-21T06:23:01Z	-
dc.identifier.issn	1439-4456	-
dc.identifier.uri	http://hdl.handle.net/10397/110739	-
dc.language.iso	en	en_US
dc.publisher	JMIR Publications, Inc.	en_US
dc.rights	©Dingqiao Wang, Jiangbo Liang, Jinguo Ye, Jingni Li, Jingpeng Li, Qikai Zhang, Qiuling Hu, Caineng Pan, Dongliang Wang, Zhong Liu, Wen Shi, Danli Shi, Fei Li, Bo Qu, Yingfeng Zheng. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 08.11.2024.	en_US
dc.rights	This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.	en_US
dc.rights	The following publication Wang, D., Liang, J., Ye, J., Li, J., Li, J., Zhang, Q., Hu, Q., Pan, C., Wang, D., Liu, Z., Shi, W., Shi, D., Li, F., Qu, B., & Zheng, Y. (2024). Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study. J Med Internet Res, 26, e58041 is available at https://dx.doi.org/10.2196/58041.	en_US
dc.subject	Large language models	en_US
dc.subject	LLMs	en_US
dc.subject	Retrieval-augmented generation	en_US
dc.subject	RAG	en_US
dc.subject	GPT-4.0	en_US
dc.subject	Claude-2	en_US
dc.subject	Google Bard	en_US
dc.subject	Diabetes education	en_US
dc.title	Enhancement of the performance of large language models in diabetes education through retrieval-augmented generation : comparative study	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	26	-
dc.identifier.doi	10.2196/58041	-
dcterms.abstract	Background: Large language models (LLMs) demonstrated advanced performance in processing clinical information. However, commercially available LLMs lack specialized medical knowledge and remain susceptible to generating inaccurate information. Given the need for self-management in diabetes, patients commonly seek information online. We introduce the Retrieval-augmented Information System for Enhancement (RISE) framework and evaluate its performance in enhancing LLMs to provide accurate responses to diabetes-related inquiries.	-
dcterms.abstract	Objective: This study aimed to evaluate the potential of the RISE framework, an information retrieval and augmentation tool, to improve the LLM’s performance to accurately and safely respond to diabetes-related inquiries.	-
dcterms.abstract	Methods: The RISE, an innovative retrieval augmentation framework, comprises 4 steps: rewriting query, information retrieval, summarization, and execution. Using a set of 43 common diabetes-related questions, we evaluated 3 base LLMs (GPT-4, Anthropic Claude 2, Google Bard) and their RISE-enhanced versions respectively. Assessments were conducted by clinicians for accuracy and comprehensiveness and by patients for understandability.	-
dcterms.abstract	Results: The integration of RISE significantly improved the accuracy and comprehensiveness of responses from all 3 base LLMs. On average, the percentage of accurate responses increased by 12% (15/129) with RISE. Specifically, the rates of accurate responses increased by 7% (3/43) for GPT-4, 19% (8/43) for Claude 2, and 9% (4/43) for Google Bard. The framework also enhanced response comprehensiveness, with mean scores improving by 0.44 (SD 0.10). Understandability was also enhanced by 0.19 (SD 0.13) on average. Data collection was conducted from September 30, 2023 to February 5, 2024.	-
dcterms.abstract	Conclusions: The RISE significantly improves LLMs’ performance in responding to diabetes-related inquiries, enhancing accuracy, comprehensiveness, and understandability. These improvements have crucial implications for RISE’s future role in patient education and chronic illness self-management, which contributes to relieving medical resource pressures and raising public awareness of medical knowledge.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Journal of medical Internet research, 2024, v. 26, e58041	-
dcterms.isPartOf	Journal of medical Internet research	-
dcterms.issued	2024	-
dc.identifier.scopus	2-s2.0-85208772500	-
dc.identifier.pmid	39046096	-
dc.identifier.eissn	1438-8871	-
dc.identifier.artn	e58041	-
dc.description.validate	202501 bcrc	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a3361	en_US
dc.identifier.SubFormID	49990	en_US
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.TA	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
jmir-2024-1-e58041.pdf		1.07 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM