Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/112799
PIRA download icon_1.1View/Download Full Text
Title: Photo-realistic talking face generation under latent space manipulation
Authors: Salahudeen, R
Siu, WC 
Chan, HA
Issue Date: Feb-2025
Source: IEEE transactions on consumer electronics, Feb. 2025, v. 71, no. 1, p. 379-387
Abstract: This paper focuses on generating photo-realistic talking face videos by leveraging on semantic facial attributes in a latent space and capturing the talking style from an old video of a speaker. We formulate a process to manipulate facial attributes in the latent space by identifying semantic facial directions. We develop a deep learning pipeline to learn the correlation between the audio and the corresponding video frames from a reference video of a speaker in an aligned latent space. This correlation is used to navigate a static face image into frames of a talking face video, which is moderated by three carefully constructed loss functions, for accurate lip synchronization and photo-realistic video reconstruction. By combining these techniques, we aim to generate high-quality talking face videos that are visually realistic and synchronized with the provided audio input. Our results were evaluated against some state-of-the-art techniques on talking face generation, and we have recorded significant improvements in the image quality of the generated talking face video.
Keywords: Deep Learning
Latent Space
Multimedia Applications
Talking Face Generation
Publisher: Institute of Electrical and Electronics Engineers
Journal: IEEE transactions on consumer electronics 
ISSN: 0098-3063
EISSN: 1558-4127
DOI: 10.1109/TCE.2024.3516387
Rights: © 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
The following publication R. Salahudeen, W. -C. Siu and H. Anthony Chan, "Photo-Realistic Talking Face Generation Under Latent Space Manipulation," in IEEE Transactions on Consumer Electronics, vol. 71, no. 1, pp. 379-387, Feb. 2025 is available at https://doi.org/10.1109/TCE.2024.3516387.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
Salahudeen_Photo_Realistic_Talking.pdf2.93 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.