Category-aware saliency enhance learning based on CLIP for weakly supervised salient object detection

Zhang, Y; Zhang, Z; Liu, T; Kong, J

doi:10.1007/s11063-024-11530-2

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/110271

Title:	Category-aware saliency enhance learning based on CLIP for weakly supervised salient object detection
Authors:	Zhang, Y Zhang, Z Liu, T Kong, J
Issue Date:	Apr-2024
Source:	Neural processing letters, Apr. 2024, v. 56, no. 2, 49
Abstract:	Weakly supervised salient object detection (SOD) using image-level category labels has been proposed to reduce the annotation cost of pixel-level labels. However, existing methods mostly train a classification network to generate a class activation map, which suffers from coarse localization and difficult pseudo-label updating. To address these issues, we propose a novel Category-aware Saliency Enhance Learning (CSEL) method based on contrastive vision-language pre-training (CLIP), which can perform image-text classification and pseudo-label updating simultaneously. Our proposed method transforms image-text classification into pixel-text matching and generates a category-aware saliency map, which is evaluated by the classification accuracy. Moreover, CSEL assesses the quality of the category-aware saliency map and the pseudo saliency map, and uses the quality confidence scores as weights to update the pseudo labels. The two maps mutually enhance each other to guide the pseudo saliency map in the correct direction. Our SOD network can be trained jointly under the supervision of the updated pseudo saliency maps. We test our model on various well-known RGB-D and RGB SOD datasets. Our model achieves an S-measure of 87.6% on the RGB-D NLPR dataset and 84.3% on the RGB ECSSD dataset. Additionally, we obtain satisfactory performance on the weakly supervised E-measure, F-measure, and mean absolute error metrics for other datasets. These results demonstrate the effectiveness of our model.
Keywords:	Category-aware Saliency Enhance Learning CLIP Salient object detection Weakly supervised
Publisher:	Springer New York LLC
Journal:	Neural processing letters
ISSN:	1370-4621
EISSN:	1573-773X
DOI:	10.1007/s11063-024-11530-2
Rights:	© The Author(s) 2024 This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The following publication Zhang, Y., Zhang, Z., Liu, T. et al. Category-Aware Saliency Enhance Learning Based on CLIP for Weakly Supervised Salient Object Detection. Neural Process Lett 56, 49 (2024) is available at https://doi.org/10.1007/s11063-024-11530-2.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
s11063-024-11530-2.pdf		1.87 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Page views

15

Citations as of Apr 14, 2025

Downloads

8

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

2

Citations as of Sep 12, 2025

Google Scholar^TM

Check