Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/118317
| Title: | Screen content video quality enhancement (SCVQE) based on machine learning | Authors: | Huang, Ziyin | Degree: | Ph.D. | Issue Date: | 2025 | Abstract: | The increasing popularity of intelligent terminals has led to a higher demand for screen content videos. Applications such as the cloud gaming, video conference, online education, etc., rely heavily on Screen Content Coding (SCC). The impact of the COVID-19 pandemic in 2020 further accelerated the necessity of online education and virtual conferences, making SCC indispensable for effective screen sharing. This paradigm shift has elevated SCV from a niche to mainstream media. Consequently, enhancing the quality of screen content videos has become a critical challenge. In this thesis, we conduct an in-depth study on deep-learning-based VQE of SCC and propose effective learning frameworks based on the characteristics of screen content videos (SCVs). Firstly, we study the dedicated tools—Intra Block Copy (IBC) and palette (PLT) modes in the SCC standard, which induces the corresponding compression loss of the decoded video. Therefore, we propose a novel post-processing network for enhancing decoded screen content videos based on the coding mode information embedded in the coded bitstream. By fusing three binary mode masks derived from dedicated coding tools with the corresponding decoded frame, we aim to elevate the quality of SCVs. Secondly, different from natural videos, screen content videos often feature abrupt scene switches and frame freezing instances, leading to visible distortions in compressed videos. Existing alignment-based models struggle to effectively enhance scene switch frames and lack efficiency when dealing with frame freezing situations. Therefore, we propose a novel alignment-free method that effectively handles both scene switches and frame freezing. In our approach, we develop a spatial and temporal feature extraction module to compress and extract spatio-temporal information from three groups of frame inputs. This enables efficient handling of scene switches. In addition, an edge aware block is proposed to extract edge information, which guides the model to focus on restoring the high-frequency components in frame freezing situations. The fusion module is then designed to adaptively fuse the features from three groups, considering different positions of video frames, to enhance frames during scene switch and frame freezing scenarios. Thirdly, existing multiple-frame models using a fixed range of neighbor frames face challenges in effectively enhancing frames during scene switches and lack efficiency in reconstructing high-frequency information. To address these limitations, we present a novel method proficient in managing scene switches and reconstructing high-frequency information. In the feature extraction part, we develop long-term and short-term feature extraction streams, in which the long-term feature extraction stream learns the contextual information, and the short-term feature extraction stream extracts more related information from shorter input to assist the long-term stream to handle fast motion and scene switches. To further enhance the frame quality during scene switches, we incorporate a similarity-based neighbor frame selector before feeding frames into the short-term stream. This selector identifies relevant neighbor frames, aiding in the efficient handling of scene switches. To dynamically fuse the short-term feature and long-term features, the multi-scale feature distillation focuses on adaptively recalibrating channel-wise feature responses to achieve effective feature distillation. In the reconstruction part, a high-frequency reconstruction block is proposed for guiding the model to restore the high-frequency components. The frameworks proposed in this thesis are evaluated through comparisons with other state-of-the-art methods, including the posed databases and the in-the-wild databases. Ablation studies and robustness tests confirm the promising performance of our frameworks, highlighting the efficacy of the novel designs in enhancing screen content quality. |
Subjects: | Video compression Machine learning Image processing Digital video Hong Kong Polytechnic University -- Dissertations |
Pages: | 126 pages : color illustrations |
| Appears in Collections: | Thesis |
Access
View full-text via https://theses.lib.polyu.edu.hk/handle/200/14236
Google ScholarTM
Check
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.


