Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework

Chen, R; Zhang, W; Liu, B; Wu, X; Chen, X; Xu, P; Liu, S; He, M; Shi, D

doi:10.1038/s41746-026-02560-2

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/118725

DC Field	Value	Language
dc.contributor	School of Optometry	en_US
dc.contributor	Research Centre for SHARP Vision	en_US
dc.creator	Chen, R	en_US
dc.creator	Zhang, W	en_US
dc.creator	Liu, B	en_US
dc.creator	Wu, X	en_US
dc.creator	Chen, X	en_US
dc.creator	Xu, P	en_US
dc.creator	Liu, S	en_US
dc.creator	He, M	en_US
dc.creator	Shi, D	en_US
dc.date.accessioned	2026-05-14T05:44:01Z	-
dc.date.available	2026-05-14T05:44:01Z	-
dc.identifier.uri	http://hdl.handle.net/10397/118725	-
dc.language.iso	en	en_US
dc.publisher	Nature Publishing Group	en_US
dc.rights	Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.	en_US
dc.rights	© The Author(s) 2026	en_US
dc.rights	The following publication Chen, R., Zhang, W., Liu, B. et al. Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework. npj Digit. Med. 9, 371 (2026) is available at https://doi.org/10.1038/s41746-026-02560-2.	en_US
dc.title	Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	9	en_US
dc.identifier.doi	10.1038/s41746-026-02560-2	en_US
dcterms.abstract	The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Though deep learning (DL) techniques offer promising avenues for improving diagnostic efficiency, data scarcity and imbalance issues persist in training robust diagnostic models, particularly for rare eye diseases. Here, we introduce EyeDiff, a generative foundation model capable of synthesizing lesion-preserving ophthalmic images from textual descriptions. Both objective metrics and expert human evaluations confirmed EyeDiff’s ability to generate high-fidelity images across multiple imaging modalities, accurately reflecting textual descriptions of diverse retinal diseases and lesion types. By augmenting minority classes across 11 globally sourced datasets, EyeDiff consistently boosted the diagnostic accuracy for both common and rare eye diseases across different foundation model types, including modality-specific, multimodal and vision-language foundation models trained solely on real data. These results underscore EyeDiff’s potential as a general-purpose text-to-image foundation model, offering a scalable and flexible approach to generate balanced, disease-relevant data for advancing retinal disease diagnosis.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	npj digital medicine, 2026, v. 9, 371	en_US
dcterms.isPartOf	npj digital medicine	en_US
dcterms.issued	2026	-
dc.identifier.eissn	2398-6352	en_US
dc.identifier.artn	371	en_US
dc.description.validate	202605 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a4418	-
dc.identifier.SubFormID	52749	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	We thank the American Society of Retina Specialists for providing the valuable Retina Image Bank and the InnoHK HKSAR Government for providing valuable support. The study was supported by the Start-up Fund for RAPs under the Strategic Hiring Scheme (P0048623) from HKSAR, Global STEM Professorship Scheme (P0046113), and Henry G. Leong Endowed Professorship in Elderly Vision Health. The sponsors or funding organizations had no role in the design or conduct of this research.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
dc.relation.rdata	The data for model training in the current study are available as open data through the following links: Retinal Image Bank (https://imagebank.asrs.org/), EyePACS (https://www.kaggle.com/c/diabetic-retinopathy-detection/data), OCTDL (https://ieee-dataport.org/documents/octdl-optical-coherence-tomography-dataset-image-based-deep-learning-methods), REFUGE (https://bitbucket.org/woalsdnd/refuge/src/master/), ORIGA (https://figshare.com/articles/dataset/Retinal_Fundus_Glaucoma_Image_dataset/24549217	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
s41746-026-02560-2.pdf		4.33 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM