Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/112799
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Electrical and Electronic Engineering | en_US |
| dc.creator | Salahudeen, R | en_US |
| dc.creator | Siu, WC | en_US |
| dc.creator | Chan, HA | en_US |
| dc.date.accessioned | 2025-05-09T00:55:02Z | - |
| dc.date.available | 2025-05-09T00:55:02Z | - |
| dc.identifier.issn | 0098-3063 | en_US |
| dc.identifier.uri | http://hdl.handle.net/10397/112799 | - |
| dc.language.iso | en | en_US |
| dc.publisher | Institute of Electrical and Electronics Engineers | en_US |
| dc.rights | © 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ | en_US |
| dc.rights | The following publication R. Salahudeen, W. -C. Siu and H. Anthony Chan, "Photo-Realistic Talking Face Generation Under Latent Space Manipulation," in IEEE Transactions on Consumer Electronics, vol. 71, no. 1, pp. 379-387, Feb. 2025 is available at https://doi.org/10.1109/TCE.2024.3516387. | en_US |
| dc.subject | Deep Learning | en_US |
| dc.subject | Latent Space | en_US |
| dc.subject | Multimedia Applications | en_US |
| dc.subject | Talking Face Generation | en_US |
| dc.title | Photo-realistic talking face generation under latent space manipulation | en_US |
| dc.type | Journal/Magazine Article | en_US |
| dc.identifier.spage | 379 | en_US |
| dc.identifier.epage | 387 | en_US |
| dc.identifier.volume | 71 | en_US |
| dc.identifier.issue | 1 | en_US |
| dc.identifier.doi | 10.1109/TCE.2024.3516387 | en_US |
| dcterms.abstract | This paper focuses on generating photo-realistic talking face videos by leveraging on semantic facial attributes in a latent space and capturing the talking style from an old video of a speaker. We formulate a process to manipulate facial attributes in the latent space by identifying semantic facial directions. We develop a deep learning pipeline to learn the correlation between the audio and the corresponding video frames from a reference video of a speaker in an aligned latent space. This correlation is used to navigate a static face image into frames of a talking face video, which is moderated by three carefully constructed loss functions, for accurate lip synchronization and photo-realistic video reconstruction. By combining these techniques, we aim to generate high-quality talking face videos that are visually realistic and synchronized with the provided audio input. Our results were evaluated against some state-of-the-art techniques on talking face generation, and we have recorded significant improvements in the image quality of the generated talking face video. | en_US |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | IEEE transactions on consumer electronics, Feb. 2025, v. 71, no. 1, p. 379-387 | en_US |
| dcterms.isPartOf | IEEE transactions on consumer electronics | en_US |
| dcterms.issued | 2025-02 | - |
| dc.identifier.scopus | 2-s2.0-85212111923 | - |
| dc.identifier.eissn | 1558-4127 | en_US |
| dc.description.validate | 202505 bcch | en_US |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | OA_Scopus/WOS | - |
| dc.description.fundingSource | RGC | en_US |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | Saint Francis University, Hong Kong (Grant Number: ISG200206) | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.description.oaCategory | CC | en_US |
| Appears in Collections: | Journal/Magazine Article | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Salahudeen_Photo_Realistic_Talking.pdf | 2.93 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



