Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/115683
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Health Technology and Informatics | en_US |
| dc.creator | Chen, Z | en_US |
| dc.creator | Chambara, N | en_US |
| dc.creator | Liu, SYW | en_US |
| dc.creator | Chow, TCM | en_US |
| dc.creator | Lai, CMS | en_US |
| dc.creator | Ying, MTC | en_US |
| dc.date.accessioned | 2025-10-20T01:27:52Z | - |
| dc.date.available | 2025-10-20T01:27:52Z | - |
| dc.identifier.uri | http://hdl.handle.net/10397/115683 | - |
| dc.language.iso | en | en_US |
| dc.publisher | MDPI AG | en_US |
| dc.rights | Copyright: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). | en_US |
| dc.rights | The following publication Chen, Z., Chambara, N., Liu, S. Y. W., Chow, T. C. M., Lai, C. M. S., & Ying, M. T. C. (2025). Intra- and Inter-Observer Reliability of ChatGPT-4o in Thyroid Nodule Ultrasound Feature Analysis Based on ACR TI-RADS: An Image-Based Study. Diagnostics, 15(20), 2617 is available at https://doi.org/10.3390/diagnostics15202617. | en_US |
| dc.subject | ChatGPT | en_US |
| dc.subject | Large language model | en_US |
| dc.subject | Observer agreement | en_US |
| dc.subject | Thyroid nodule | en_US |
| dc.subject | Ultrasound features | en_US |
| dc.title | Intra- and inter-observer reliability of ChatGPT-4o in thyroid nodule ultrasound feature analysis based on ACR TI-RADS : an image-based study | en_US |
| dc.type | Journal/Magazine Article | en_US |
| dc.identifier.volume | 15 | en_US |
| dc.identifier.issue | 20 | en_US |
| dc.identifier.doi | 10.3390/diagnostics15202617 | en_US |
| dcterms.abstract | Background/Objectives: Advances in large language models like ChatGPT-4o have extended their use to medical image analysis. Accurate assessment of thyroid nodule ultrasound features using ACR TI-RADS is crucial for clinical practice. This study aims to evaluate ChatGPT-4o’s intra-observer consistency and its agreement with an expert in analyzing these features from ultrasound image assessments based on ACR TI-RADS. | en_US |
| dcterms.abstract | Methods: This cross-sectional study used ultrasound images from 100 thyroid nodules collected prospectively between May 2019 and August 2021. Ultrasound images were analyzed by ChatGPT-4o, following ACR TI-RADS guidelines, to assess features of thyroid nodule including composition, echogenicity, shape, margin, and echogenic foci. The analysis was repeated after one week to evaluate intra-observer reliability. The ultrasound images were also analyzed by another ultrasound expert for the evaluation of inter-observer reliability. Agreement was measured using Cohen’s Kappa coefficient, and concordance rates were calculated based on alignment with the expert’s reference classifications. | en_US |
| dcterms.abstract | Results: Intra-observer agreement for ChatGPT-4o was moderate for composition (Kappa = 0.449) and echogenic foci (Kappa = 0.404), with substantial agreement for echogenicity (Kappa = 0.795). Agreement was notably low for shape (Kappa = −0.051) and margin (Kappa = 0.154). Inter-observer agreement between ChatGPT-4o and the expert was generally low, with Kappa values ranging from −0.006 to 0.238, the highest being for echogenic foci. Overall concordance rates between ChatGPT-4o and expert evaluations ranged from 46.6% to 48.2%, with the highest for shape (65%) and the lowest for echogenicity (29%). | en_US |
| dcterms.abstract | Conclusions: ChatGPT-4o showed favorable consistency in assessing some thyroid nodule features in intra-observer analysis, but notable variability in others. Inter-observer comparisons with expert evaluations revealed generally low agreement across all features, despite acceptable concordance for certain imaging characteristics. While promising for specific ultrasound features, ChatGPT-4o’s consistency and accuracy still vary significantly compared to expert assessments. | en_US |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | Diagnostics, Oct. 2025, v. 15, no. 20, 2617 | en_US |
| dcterms.isPartOf | Diagnostics | en_US |
| dcterms.issued | 2025-10 | - |
| dc.identifier.artn | 2617 | en_US |
| dc.description.validate | 202510 bcch | en_US |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | a4125 | - |
| dc.identifier.SubFormID | 52113 | - |
| dc.description.fundingSource | RGC | en_US |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | This study was supported by the General Research Fund of Research Grants Council (Ref no. 15102524), and the research grants from the Hong Kong Polytechnic University (Ref nos. P0048845 and P0056738), with support to Z.C. from P0056738, and to M.T.C.Y. from both 15102524 and P0048845. | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.description.oaCategory | CC | en_US |
| Appears in Collections: | Journal/Magazine Article | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| diagnostics-15-02617.pdf | 646.02 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



