Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/118725
| Title: | Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework | Authors: | Chen, R Zhang, W Liu, B Wu, X Chen, X Xu, P Liu, S He, M Shi, D |
Issue Date: | 2026 | Source: | npj digital medicine, 2026, v. 9, 371 | Abstract: | The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Though deep learning (DL) techniques offer promising avenues for improving diagnostic efficiency, data scarcity and imbalance issues persist in training robust diagnostic models, particularly for rare eye diseases. Here, we introduce EyeDiff, a generative foundation model capable of synthesizing lesion-preserving ophthalmic images from textual descriptions. Both objective metrics and expert human evaluations confirmed EyeDiff’s ability to generate high-fidelity images across multiple imaging modalities, accurately reflecting textual descriptions of diverse retinal diseases and lesion types. By augmenting minority classes across 11 globally sourced datasets, EyeDiff consistently boosted the diagnostic accuracy for both common and rare eye diseases across different foundation model types, including modality-specific, multimodal and vision-language foundation models trained solely on real data. These results underscore EyeDiff’s potential as a general-purpose text-to-image foundation model, offering a scalable and flexible approach to generate balanced, disease-relevant data for advancing retinal disease diagnosis. | Publisher: | Nature Publishing Group | Journal: | npj digital medicine | EISSN: | 2398-6352 | DOI: | 10.1038/s41746-026-02560-2 | Research Data: | The data for model training in the current study are available as open data through the following links: Retinal Image Bank (https://imagebank.asrs.org/), EyePACS (https://www.kaggle.com/c/diabetic-retinopathy-detection/data), OCTDL (https://ieee-dataport.org/documents/octdl-optical-coherence-tomography-dataset-image-based-deep-learning-methods), REFUGE (https://bitbucket.org/woalsdnd/refuge/src/master/), ORIGA (https://figshare.com/articles/dataset/Retinal_Fundus_Glaucoma_Image_dataset/24549217 | Rights: | Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. © The Author(s) 2026 The following publication Chen, R., Zhang, W., Liu, B. et al. Boosting foundation models for rare eye disease diagnosis via a multimodal text-to-image generative framework. npj Digit. Med. 9, 371 (2026) is available at https://doi.org/10.1038/s41746-026-02560-2. |
| Appears in Collections: | Journal/Magazine Article |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| s41746-026-02560-2.pdf | 4.33 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



