Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115683
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Health Technology and Informaticsen_US
dc.creatorChen, Zen_US
dc.creatorChambara, Nen_US
dc.creatorLiu, SYWen_US
dc.creatorChow, TCMen_US
dc.creatorLai, CMSen_US
dc.creatorYing, MTCen_US
dc.date.accessioned2025-10-20T01:27:52Z-
dc.date.available2025-10-20T01:27:52Z-
dc.identifier.urihttp://hdl.handle.net/10397/115683-
dc.language.isoenen_US
dc.publisherMDPI AGen_US
dc.rightsCopyright: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).en_US
dc.rightsThe following publication Chen, Z., Chambara, N., Liu, S. Y. W., Chow, T. C. M., Lai, C. M. S., & Ying, M. T. C. (2025). Intra- and Inter-Observer Reliability of ChatGPT-4o in Thyroid Nodule Ultrasound Feature Analysis Based on ACR TI-RADS: An Image-Based Study. Diagnostics, 15(20), 2617 is available at https://doi.org/10.3390/diagnostics15202617.en_US
dc.subjectChatGPTen_US
dc.subjectLarge language modelen_US
dc.subjectObserver agreementen_US
dc.subjectThyroid noduleen_US
dc.subjectUltrasound featuresen_US
dc.titleIntra- and inter-observer reliability of ChatGPT-4o in thyroid nodule ultrasound feature analysis based on ACR TI-RADS : an image-based studyen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume15en_US
dc.identifier.issue20en_US
dc.identifier.doi10.3390/diagnostics15202617en_US
dcterms.abstractBackground/Objectives: Advances in large language models like ChatGPT-4o have extended their use to medical image analysis. Accurate assessment of thyroid nodule ultrasound features using ACR TI-RADS is crucial for clinical practice. This study aims to evaluate ChatGPT-4o’s intra-observer consistency and its agreement with an expert in analyzing these features from ultrasound image assessments based on ACR TI-RADS.en_US
dcterms.abstractMethods: This cross-sectional study used ultrasound images from 100 thyroid nodules collected prospectively between May 2019 and August 2021. Ultrasound images were analyzed by ChatGPT-4o, following ACR TI-RADS guidelines, to assess features of thyroid nodule including composition, echogenicity, shape, margin, and echogenic foci. The analysis was repeated after one week to evaluate intra-observer reliability. The ultrasound images were also analyzed by another ultrasound expert for the evaluation of inter-observer reliability. Agreement was measured using Cohen’s Kappa coefficient, and concordance rates were calculated based on alignment with the expert’s reference classifications.en_US
dcterms.abstractResults: Intra-observer agreement for ChatGPT-4o was moderate for composition (Kappa = 0.449) and echogenic foci (Kappa = 0.404), with substantial agreement for echogenicity (Kappa = 0.795). Agreement was notably low for shape (Kappa = −0.051) and margin (Kappa = 0.154). Inter-observer agreement between ChatGPT-4o and the expert was generally low, with Kappa values ranging from −0.006 to 0.238, the highest being for echogenic foci. Overall concordance rates between ChatGPT-4o and expert evaluations ranged from 46.6% to 48.2%, with the highest for shape (65%) and the lowest for echogenicity (29%).en_US
dcterms.abstractConclusions: ChatGPT-4o showed favorable consistency in assessing some thyroid nodule features in intra-observer analysis, but notable variability in others. Inter-observer comparisons with expert evaluations revealed generally low agreement across all features, despite acceptable concordance for certain imaging characteristics. While promising for specific ultrasound features, ChatGPT-4o’s consistency and accuracy still vary significantly compared to expert assessments.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationDiagnostics, Oct. 2025, v. 15, no. 20, 2617en_US
dcterms.isPartOfDiagnosticsen_US
dcterms.issued2025-10-
dc.identifier.artn2617en_US
dc.description.validate202510 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera4125-
dc.identifier.SubFormID52113-
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThis study was supported by the General Research Fund of Research Grants Council (Ref no. 15102524), and the research grants from the Hong Kong Polytechnic University (Ref nos. P0048845 and P0056738), with support to Z.C. from P0056738, and to M.T.C.Y. from both 15102524 and P0048845.en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
diagnostics-15-02617.pdf646.02 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.