Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/112799
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorSalahudeen, Ren_US
dc.creatorSiu, WCen_US
dc.creatorChan, HAen_US
dc.date.accessioned2025-05-09T00:55:02Z-
dc.date.available2025-05-09T00:55:02Z-
dc.identifier.issn0098-3063en_US
dc.identifier.urihttp://hdl.handle.net/10397/112799-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/en_US
dc.rightsThe following publication R. Salahudeen, W. -C. Siu and H. Anthony Chan, "Photo-Realistic Talking Face Generation Under Latent Space Manipulation," in IEEE Transactions on Consumer Electronics, vol. 71, no. 1, pp. 379-387, Feb. 2025 is available at https://doi.org/10.1109/TCE.2024.3516387.en_US
dc.subjectDeep Learningen_US
dc.subjectLatent Spaceen_US
dc.subjectMultimedia Applicationsen_US
dc.subjectTalking Face Generationen_US
dc.titlePhoto-realistic talking face generation under latent space manipulationen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage379en_US
dc.identifier.epage387en_US
dc.identifier.volume71en_US
dc.identifier.issue1en_US
dc.identifier.doi10.1109/TCE.2024.3516387en_US
dcterms.abstractThis paper focuses on generating photo-realistic talking face videos by leveraging on semantic facial attributes in a latent space and capturing the talking style from an old video of a speaker. We formulate a process to manipulate facial attributes in the latent space by identifying semantic facial directions. We develop a deep learning pipeline to learn the correlation between the audio and the corresponding video frames from a reference video of a speaker in an aligned latent space. This correlation is used to navigate a static face image into frames of a talking face video, which is moderated by three carefully constructed loss functions, for accurate lip synchronization and photo-realistic video reconstruction. By combining these techniques, we aim to generate high-quality talking face videos that are visually realistic and synchronized with the provided audio input. Our results were evaluated against some state-of-the-art techniques on talking face generation, and we have recorded significant improvements in the image quality of the generated talking face video.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE transactions on consumer electronics, Feb. 2025, v. 71, no. 1, p. 379-387en_US
dcterms.isPartOfIEEE transactions on consumer electronicsen_US
dcterms.issued2025-02-
dc.identifier.scopus2-s2.0-85212111923-
dc.identifier.eissn1558-4127en_US
dc.description.validate202505 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Scopus/WOS-
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextSaint Francis University, Hong Kong (Grant Number: ISG200206)en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Salahudeen_Photo_Realistic_Talking.pdf2.93 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.