Development and validation of a large language model-powered chatbot for neurosurgery : mixed methods study on enhancing perioperative patient education

Ho, CM; Guan, S; Mok, PKL; Lam, CHW; Ho, WY; Mak, CHK; Qin, H; Wong, AKC; Hui, V

doi:10.2196/74299

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115986

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	-
dc.contributor	School of Nursing	-
dc.creator	Ho, CM	-
dc.creator	Guan, S	-
dc.creator	Mok, PKL	-
dc.creator	Lam, CHW	-
dc.creator	Ho, WY	-
dc.creator	Mak, CHK	-
dc.creator	Qin, H	-
dc.creator	Wong, AKC	-
dc.creator	Hui, V	-
dc.date.accessioned	2025-11-18T06:48:45Z	-
dc.date.available	2025-11-18T06:48:45Z	-
dc.identifier.issn	1439-4456	-
dc.identifier.uri	http://hdl.handle.net/10397/115986	-
dc.language.iso	en	en_US
dc.publisher	JMIR Publications, Inc.	en_US
dc.rights	©Chung Man Ho, Shaowei Guan, Prudence Kwan-Lam Mok, Candice HW Lam, Wai Ying Ho, Calvin Hoi-Kwan Mak, Harry Qin, Arkers Kwan Ching Wong, Vivian Hui. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 15.07.2025. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.	en_US
dc.rights	The following publication Ho CM, Guan S, Mok PKL, Lam CH, Ho WY, Mak CHK, Qin H, Wong AKC, Hui V, Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education. J Med Internet Res 2025;27:e74299 is available at https://doi.org/10.2196/74299.	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Chatbot	en_US
dc.subject	Digital health	en_US
dc.subject	Large language model	en_US
dc.subject	Neurosurgery	en_US
dc.subject	Patient education	en_US
dc.subject	Patient-centered care	en_US
dc.subject	Perioperative care	en_US
dc.subject	Retrieval-augmented generation	en_US
dc.title	Development and validation of a large language model-powered chatbot for neurosurgery : mixed methods study on enhancing perioperative patient education	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	27	-
dc.identifier.doi	10.2196/74299	-
dcterms.abstract	Background: Perioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materials, often lack scalability and personalization. Artificial intelligence (AI)–powered chatbots have demonstrated efficacy in various health care contexts; however, their role in neuroendovascular perioperative support remains underexplored. Given the complexity of neuroendovascular procedures and the need for continuous, tailored patient education, AI chatbots have the potential to offer tailored perioperative guidance to improve patient education in this specialty.	-
dcterms.abstract	Objective: We aimed to develop, validate, and assess NeuroBot, an AI-driven system that uses large language models (LLMs) with retrieval-augmented generation to deliver timely, accurate, and evidence-based responses to patient inquiries in neurosurgery, ultimately improving the effectiveness of patient education.	-
dcterms.abstract	Methods: A mixed methods approach was used, consisting of 3 phases. In the first phase, internal validation, we compared the performance of Assistants API, ChatGPT, and Qwen by evaluating their responses to 306 bilingual neuroendovascular-related questions. The accuracy, relevance, and completeness of the responses were evaluated using a Likert scale; statistical analyses included ANOVA and paired t tests. In the second phase, external validation, 10 neurosurgical experts rated the responses generated by NeuroBot using the same evaluation metrics applied in the internal validation phase. The consistency of their ratings was measured using the intraclass correlation coefficient. Finally, in the third phase, a qualitative study was conducted through interviews with 18 health care providers, which helped identify key themes related to the NeuroBot’s usability and perceived benefits. Thematic analysis was performed using NVivo and interrater reliability was confirmed through Cohen κ.	-
dcterms.abstract	Results: The Assistants API outperformed both ChatGPT and Qwen, achieving a mean accuracy score of 5.28 out of 6 (95% CI 5.21-5.35), with a statistically significant result (P<.001). External expert ratings for NeuroBot demonstrated significant improvements, with scores of 5.70 out of 6 (95% CI 5.46-5.94) for accuracy, 5.58 out of 6 (95% CI 5.45-5.94) for relevance, and 2.70 out of 3 (95% CI 2.73-2.97) for completeness. Qualitative insights highlighted NeuroBot’s potential to reduce staff workload, enhance patient education, and deliver evidence-based responses.	-
dcterms.abstract	Conclusions: NeuroBot, leveraging LLMs with the retrieval-augmented generation technique, demonstrates the potential of LLM-based chatbots in perioperative neuroendovascular care, offering scalable and continuous support. By integrating domain-specific knowledge, NeuroBot simplifies communication between professionals and patients while ensuring patients have 24-7 access to reliable, evidence-based information. Further refinement and research will enhance NeuroBot’s ability to foster patient-centered communication, optimize clinical outcomes, and advance AI-driven innovations in health care delivery.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Journal of medical Internet research, 2025, v. 27, e74299	-
dcterms.isPartOf	Journal of medical Internet research	-
dcterms.issued	2025	-
dc.identifier.scopus	2-s2.0-105010624419	-
dc.identifier.pmid	40663377	-
dc.identifier.eissn	1438-8871	-
dc.identifier.artn	e74299	-
dc.description.validate	202511 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Scopus/WOS	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	The research was granted ethics approval in the Hospital Authority after review by the Central Institutional Review Board (CIRB-2024-486-3) and is also applying ethics approval in the Hong Kong Polytechnic University.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
jmir-2025-1-e74299.pdf		675.13 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

SCOPUS^TM
Citations

4

Citations as of May 8, 2026

WEB OF SCIENCE^TM
Citations

4

Citations as of Apr 23, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM