Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/116815
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Computing | - |
| dc.creator | Li, R | - |
| dc.creator | Guo, J | - |
| dc.creator | Zhou, Q | - |
| dc.creator | Guo, S | - |
| dc.date.accessioned | 2026-01-21T03:52:53Z | - |
| dc.date.available | 2026-01-21T03:52:53Z | - |
| dc.identifier.isbn | 979-8-4007-0686-8 | - |
| dc.identifier.uri | http://hdl.handle.net/10397/116815 | - |
| dc.description | 32nd ACM International Conference on Multimedia, Melbourne VIC, Australia, 28 October 2024 - 1 November 2024 | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | The Association for Computing Machinery | en_US |
| dc.rights | This work is licensed under a Creative Commons Attribution International 4.0 License (https://creativecommons.org/licenses/by/4.0/). | en_US |
| dc.rights | ©2024 Copyright held by the owner/author(s). | en_US |
| dc.rights | The following publication Li, R., Guo, J., Zhou, Q., & Guo, S. (2024). FreePIH: Training-Free Painterly Image Harmonization with Diffusion Model Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne VIC, Australia is available at https://doi.org/10.1145/3664647.3680780. | en_US |
| dc.subject | Diffusion model | en_US |
| dc.subject | Image editing | en_US |
| dc.subject | Image harmonization | en_US |
| dc.title | FreePIH : training-free painterly image harmonization with diffusion model | en_US |
| dc.type | Conference Paper | en_US |
| dc.identifier.spage | 7464 | - |
| dc.identifier.epage | 7473 | - |
| dc.identifier.doi | 10.1145/3664647.3680780 | - |
| dcterms.abstract | This paper provides an efficient training-free painterly image harmonization (PIH) method, dubbed FreePIH, that leverages only a pre-trained diffusion model to achieve state-of-the-art harmonization results. Unlike existing methods that require either training auxiliary networks or fine-tuning a large pre-trained backbone, or both, to harmonize a foreground object with a painterly-style background image, our FreePIH tames the denoising process as a plug-in module for foreground image style transfer. Specifically, we find that the very last few steps of the denoising (i.e., generation) process strongly correspond to the stylistic information of images, and based on this, we propose to augment the latent features of both the foreground and background images with Gaussians for a direct denoising-based harmonization. To guarantee the fidelity of the harmonized image, we make use of latent features to enforce the consistency of the content and stability of the foreground objects in the latent space, and meanwhile, aligning both fore-/back-grounds with the same style. Moreover, to accommodate the generation with more structural and textural details, we further integrate text prompts to attend to the latent features, hence improving the generation quality. Quantitative and qualitative evaluations on COCO and LAION 5B datasets demonstrate that our method can surpass representative baselines by large margins. | - |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | In MM ’24: Proceedings of the 32nd ACM International Conference on Multimedia, p. 7464-7473. New York, NY: The Association for Computing Machinery, 2024 | - |
| dcterms.issued | 2024 | - |
| dc.identifier.scopus | 2-s2.0-85209810229 | - |
| dc.relation.ispartofbook | MM ’24: Proceedings of the 32nd ACM International Conference on Multimedia | - |
| dc.relation.conference | ACM International Conference on Multimedia [MM] | - |
| dc.publisher.place | New York, NY | en_US |
| dc.description.validate | 202601 bcch | - |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | OA_Scopus/WOS | en_US |
| dc.description.fundingSource | RGC | en_US |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | This research was supported by fundings from the Key-Area Research and Development Program of Guangdong Province (No. 2021B0101400003), Hong Kong RGC Research Impact Fund (No. R5060-19, No. R5034-18), Areas of Excellence Scheme (AoE/E-601/22-R), General Research Fund (No. 152203/20E, 152244/21E, 152169/22E, 152228/23E), Hong Kong RGC General Research Fund (No. 152211/23E and 15216424/24E), National Natural Science Foundation of China (No. 62102327), and PolyU Internal Fund (No. P0043932). This research was also supported by NVIDIA AI Technology Center (NVAITC). | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.description.oaCategory | CC | en_US |
| Appears in Collections: | Conference Paper | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 3664647.3680780.pdf | 5.77 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



