Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/109379
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Biomedical Engineering | en_US |
dc.contributor | Research Institute for Smart Ageing | en_US |
dc.creator | Li, Q | en_US |
dc.creator | Yan, X | en_US |
dc.creator | Xu, J | en_US |
dc.creator | Yuan, R | en_US |
dc.creator | Zhang, Y | en_US |
dc.creator | Feng, R | en_US |
dc.creator | Shen, Q | en_US |
dc.creator | Zhang, X | en_US |
dc.creator | Wang, S | en_US |
dc.date.accessioned | 2024-10-07T08:32:30Z | - |
dc.date.available | 2024-10-07T08:32:30Z | - |
dc.identifier.uri | http://hdl.handle.net/10397/109379 | - |
dc.language.iso | en | en_US |
dc.publisher | Springer | en_US |
dc.subject | Anatomical structure | en_US |
dc.subject | Contrastive learning | en_US |
dc.subject | Medical vision-language | en_US |
dc.subject | Pre-training | en_US |
dc.subject | Representation learning | en_US |
dc.title | Anatomical structure-guided medical vision-language pre-training | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 80 | en_US |
dc.identifier.epage | 90 | en_US |
dc.identifier.doi | 10.1007/978-3-031-72120-5_8 | en_US |
dcterms.abstract | Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets <anatomical region, finding, existence>, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods. | en_US |
dcterms.accessRights | embargoed access | en_US |
dcterms.bibliographicCitation | In MG Linguraru,Q Dou, A Feragen, S Giannarou, B Glocker, K Lekadir, & JA Schnabel [Eds.]. Medical Image Computing and Computer Assisted Intervention– MICCAI 2024 27th International Conference Marrakesh, Morocco, October 6–10, 2024 Proceedings, Part XI, p. 80-90. Cham, Switzerland: Springer, 2024 | en_US |
dcterms.issued | 2024 | - |
dc.relation.ispartofbook | Medical Image Computing and Computer Assisted Intervention– MICCAI 2024 : 27th International Conference Marrakesh, Morocco, October 6–10, 2024 Proceedings, Part XI | en_US |
dc.relation.conference | Medical Image Computing and Computer Assisted Intervention [MICCAI] | en_US |
dc.description.validate | 202410 bcch | en_US |
dc.description.oa | Not applicable | en_US |
dc.identifier.FolderNumber | a3073a | - |
dc.identifier.SubFormID | 49382 | - |
dc.description.fundingSource | Others | en_US |
dc.description.fundingText | Start-up Fund of The Hong Kong Polytechnic University (No. P0045999); the Seed Fund of the Research Institute for Smart Ageing (No. P0050946) | en_US |
dc.description.pubStatus | Published | en_US |
dc.date.embargo | 2025-10-03 | en_US |
dc.description.oaCategory | Green (AAM) | en_US |
Appears in Collections: | Conference Paper |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.