Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/118223
| Title: | ChatGPT-5–based large language model analysis versus an FDA-approved AI-CAD system for thyroid nodule ultrasound evaluation | Authors: | Chen, Z Ye, M Liang, J Chen, F Ying, MTC |
Issue Date: | Feb-2026 | Source: | European journal of radiology, Feb. 2026, v. 195, 112639 | Abstract: | Purpose: Recent advances in multimodal large language models (LLMs) have demonstrated promising potential for medical image analysis, yet their diagnostic capability in thyroid ultrasound remains unverified. This study explored the feasibility of ChatGPT-5, the latest multimodal LLM, for thyroid nodule classification and contextualized its diagnostic performance against S-Detect, an FDA-approved commercial computer-aided diagnosis system. Methods: In this prospective study, 141 patients with 186 nodules who underwent preoperative ultrasound and subsequent surgery were enrolled. For S-Detect, the largest transverse grayscale ultrasound image of each nodule was analyzed with automated contouring for binary classification. For ChatGPT-5, cropped transverse and longitudinal nodule ultrasound images were analyzed using a standardized diagnostic prompt for binary classification. Agreement with histopathology was assessed using Kappa statistics; sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC) were calculated. Results: Both systems showed statistically significant ability to distinguish benign from malignant nodules ( P < 0.05). Agreement with histopathology was fair for ChatGPT-5 ( Kappa = 0.224) and moderate for S-Detect ( Kappa = 0.579). ChatGPT-5 demonstrated sensitivity 50.8 %, specificity 75.8 %, and accuracy 59.1 %, whereas S-Detect achieved higher sensitivity (91.9 %) and accuracy (82.3 %) but lower specificity (62.9 %). The AUC for S-Detect (77.4 %) was significantly greater than that for ChatGPT-5 (63.3 %, P < 0.001). Conclusions: ChatGPT-5 demonstrated feasibility for thyroid nodule classification but showed lower diagnostic performance than the licensed, pre-trained S-Detect system and is not yet adequate for medical imaging applications. |
Keywords: | ChatGPT Large language model S-Detect Thyroid nodule Ultrasound |
Publisher: | Elsevier Ireland Ltd. | Journal: | European journal of radiology | ISSN: | 0720-048X | EISSN: | 1872-7727 | DOI: | 10.1016/j.ejrad.2025.112639 |
| Appears in Collections: | Journal/Magazine Article |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



