Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models

Jin, K; Sun, Q; Kang, D; Luo, Z; Yu, T; Han, W; Zhang, Y; Wang, M; Shi, D; Grzybowski, A

doi:10.1038/s41746-025-02300-y

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/116725

DC Field	Value	Language
dc.contributor	School of Optometry	en_US
dc.contributor	Research Centre for SHARP Vision	en_US
dc.creator	Jin, K	en_US
dc.creator	Sun, Q	en_US
dc.creator	Kang, D	en_US
dc.creator	Luo, Z	en_US
dc.creator	Yu, T	en_US
dc.creator	Han, W	en_US
dc.creator	Zhang, Y	en_US
dc.creator	Wang, M	en_US
dc.creator	Shi, D	en_US
dc.creator	Grzybowski, A	en_US
dc.date.accessioned	2026-01-15T08:03:49Z	-
dc.date.available	2026-01-15T08:03:49Z	-
dc.identifier.uri	http://hdl.handle.net/10397/116725	-
dc.language.iso	en	en_US
dc.publisher	Nature Publishing Group	en_US
dc.rights	© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.	en_US
dc.rights	The following publication Jin, K., Sun, Q., Kang, D. et al. Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models. npj Digit. Med. 9, 99 (2026) is available at https://doi.org/10.1038/s41746-025-02300-y.	en_US
dc.title	Grounded report generation for enhancing ophthalmic ultrasound interpretation using Vision-Language Segmentation models	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	9	en_US
dc.identifier.doi	10.1038/s41746-025-02300-y	en_US
dcterms.abstract	Accurate interpretation of ophthalmic ultrasound is crucial for diagnosing eye conditions but remains time-consuming and requires significant expertise. With the increasing volume of ultrasound data, there is a need for Artificial Intelligence (AI) systems capable of efficiently analyzing images and generating reports. Traditional AI models for report generation cannot simultaneously identify lesions and lack interpretability. This study proposes the Vision-Language Segmentation (VLS) model, combining Vision-Language Model (VLM) with the Segment Anything Model (SAM) to improve interpretability in ophthalmic ultrasound imaging. Using data from three hospitals, totaling 64,098 images and 21,355 reports, the VLS model achieved a BLEU4 score of 66.37 in internal test set, and 85.36 and 73.77 in external test sets. The model achieved a mean dice coefficient of 59.6% in internal test set, and dice coefficients of 50.2% and 51.5% with specificity values of 97.8% and 97.7% in external test sets, respectively. Overall diagnostic accuracy was 90.59% in internal and 71.87% in external test sets. A cost-effectiveness analysis demonstrated a 30-fold reduction in report costs, from $39 per report by senior ophthalmologists to $1.3 for VLS. This approach enhances diagnostic accuracy, reduces manual effort, and accelerates workflows, offering a promising solution for ophthalmic ultrasound interpretation.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	npj digital medicine, 2026, v. 9, 99	en_US
dcterms.isPartOf	npj digital medicine	en_US
dcterms.issued	2026	-
dc.identifier.eissn	2398-6352	en_US
dc.identifier.artn	99	en_US
dc.description.validate	202601 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a4266a	-
dc.identifier.SubFormID	52486	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This study was supported by National Natural Science Foundation of China (82201195).	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
s41746-025-02300-y.pdf		3.49 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM