Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109858
DC FieldValueLanguage
dc.contributorDepartment of Computing-
dc.creatorYang, Xi-
dc.identifier.urihttps://theses.lib.polyu.edu.hk/handle/200/13239-
dc.language.isoEnglish-
dc.titleTowards effective and efficient real-world video super-resolution-
dc.typeThesis-
dcterms.abstractWith the rapid development of consumer electronics and the Internet, we are entering an era of high-definition visual media. In the era of high-definition visual media, the resolution of videos plays a pivotal role in the quality of the viewer’s experience. With the insatiable demand for higher-resolution content, video super-resolution (VSR) has emerged as a critical area of research within the field of computer vision. VSR refers to the process of reconstructing a high-resolution (HR) video from its low-resolution (LR) counterpart. This not only enhances the visual experience for end-users but also has practical applications in surveillance, medical imaging, and digital restoration of archival footage.-
dcterms.abstractThe challenge of video super-resolution lies in accurately inferring high-frequency details that are not present in the low-resolution source. Early techniques in VSR were largely based on interpolation methods, which often resulted in artifacts such as blurring and aliasing. The advent of machine learning, particularly deep learn­ing, has revolutionized this field by enabling more sophisticated approaches that can learn complex mappings from LR to HR content, utilizing temporal coherence and contextual information across video frames.-
dcterms.abstractIn this thesis, we embrace the recent advance of deep neural networks (DNNs) to address the challenges in VSR research, aiming to achieve effective and efficient real-world video super-resolution performance.-
dcterms.abstractIn Chapter 1, we review some related works, and discuss contribution and organization of this thesis.-
dcterms.abstractIn Chapter 2, we develop an efficient VSR algorithm with a flow-guided deformable attention propagation module, tackling at real-time online setting, which fit the need for online streaming application like streaming media and video surveillance. The flow-guided deformable attention propagation module leverages the corresponding priors provided by a fast optical flow network in deformable attention computation and consequently helps propagating recurrent state information effectively and ef­ficiently. The proposed algorithm achieves state-of-the-art results on widely-used benchmarking VSR datasets in terms of effectiveness and efficiency.-
dcterms.abstractIn Chapter 3, we build the first real-world VSR dataset, aiming to bridge the synthetic-­to-real gap in previous VSR research and pave the way towards real-world VSR. To help more effectively train VSR models on the proposed dataset, we propose a de­composition based loss considering the characteristics of the constructed datasets. Experiments validate that VSR models trained on our RealVSR dataset demonstrate better visual quality than those trained on synthetic datasets under real-world set­tings and they also exhibit good generalization capability in cross-camera tests.-
dcterms.abstractIn Chapter 4, we propose motion-guided latent diffusion (MGLD) based VSR algo­rithm, which achieves highly competitive real-world VSR results, exhibiting percep­tually much more realistic details with fewer flickering artifacts than existing state-of-the-arts. To tackle the ill-poseness of real-world VSR problem, we leverage the powerful generation capability provided by a large pre-trained text-to-image diffusion model. To improve the temporal consistency, we propose a motion-guided sampling strategy and fine-tune variation decoder with an innovative sequence-oriented loss.-
dcterms.abstractIn Chapter 5, we develop a VSR algorithm by harnessing the capabilities of a robust video diffusion generation prior, achieving temporally consistent and high-quality VSR outcomes. To effectively utilize the diffusion video prior for VSR, we implement a ControlNet-style mechanism to manage the sequence VSR process and fine-tune the model on a large-scale video dataset. The powerful video diffusion prior coupled with our control design enables the model to achieve commendable VSR results at the segment level. To ensure seamless continuity between segments and maintain long-term consistency, we have further crafted a segment-based recurrent inference pipeline.-
dcterms.abstractIn summary, our works contribute to the development of VSR research by designing more efficient network architecture to boost the efficiency of real-world VSR algo­rithms, addressing the lack of real-world VSR benchmarking datasets, developing more effective real-world VSR algorithms by exploiting the image and video diffusion priors.-
dcterms.accessRightsopen access-
dcterms.educationLevelPh.D.-
dcterms.extentxviii, 127 pages : color illustrations-
dcterms.issued2024-
dcterms.LCSHHigh resolution imaging-
dcterms.LCSHImage processing -- Digital techniques-
dcterms.LCSHNeural networks (Computer science)-
dcterms.LCSHHong Kong Polytechnic University -- Dissertations-
Appears in Collections:Thesis
Show simple item record

Page views

53
Citations as of Apr 14, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.