Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/118223
Title: ChatGPT-5–based large language model analysis versus an FDA-approved AI-CAD system for thyroid nodule ultrasound evaluation
Authors: Chen, Z 
Ye, M
Liang, J
Chen, F
Ying, MTC 
Issue Date: Feb-2026
Source: European journal of radiology, Feb. 2026, v. 195, 112639
Abstract: Purpose: Recent advances in multimodal large language models (LLMs) have demonstrated promising potential for medical image analysis, yet their diagnostic capability in thyroid ultrasound remains unverified. This study explored the feasibility of ChatGPT-5, the latest multimodal LLM, for thyroid nodule classification and contextualized its diagnostic performance against S-Detect, an FDA-approved commercial computer-aided diagnosis system.
Methods: In this prospective study, 141 patients with 186 nodules who underwent preoperative ultrasound and subsequent surgery were enrolled. For S-Detect, the largest transverse grayscale ultrasound image of each nodule was analyzed with automated contouring for binary classification. For ChatGPT-5, cropped transverse and longitudinal nodule ultrasound images were analyzed using a standardized diagnostic prompt for binary classification. Agreement with histopathology was assessed using Kappa statistics; sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC) were calculated.
Results: Both systems showed statistically significant ability to distinguish benign from malignant nodules ( P < 0.05). Agreement with histopathology was fair for ChatGPT-5 ( Kappa = 0.224) and moderate for S-Detect ( Kappa = 0.579). ChatGPT-5 demonstrated sensitivity 50.8 %, specificity 75.8 %, and accuracy 59.1 %, whereas S-Detect achieved higher sensitivity (91.9 %) and accuracy (82.3 %) but lower specificity (62.9 %). The AUC for S-Detect (77.4 %) was significantly greater than that for ChatGPT-5 (63.3 %, P < 0.001).
Conclusions: ChatGPT-5 demonstrated feasibility for thyroid nodule classification but showed lower diagnostic performance than the licensed, pre-trained S-Detect system and is not yet adequate for medical imaging applications.
Keywords: ChatGPT
Large language model
S-Detect
Thyroid nodule
Ultrasound
Publisher: Elsevier Ireland Ltd.
Journal: European journal of radiology 
ISSN: 0720-048X
EISSN: 1872-7727
DOI: 10.1016/j.ejrad.2025.112639
Appears in Collections:Journal/Magazine Article

Open Access Information
Status embargoed access
Embargo End Date 2027-02-28
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.