HEVC based screen content coding and transcoding using machine learning techniques

Kuang, Wei

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/86450

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	-
dc.creator	Kuang, Wei	-
dc.identifier.uri	https://theses.lib.polyu.edu.hk/handle/200/10214	-
dc.language.iso	English	-
dc.title	HEVC based screen content coding and transcoding using machine learning techniques	-
dc.type	Thesis	-
dcterms.abstract	Screen content video is one of the emerging videos, and it usually shows mixed content with both of nature image blocks (NIBs) and computer-generated screen content blocks (SCBs). Since High Efficiency Video Coding (HEVC) is only optimized for NIBs while SCBs exhibit different characteristics, new techniques are necessary for SCBs. Screen Content Coding (SCC) extension was developed on top of HEVC to explore new coding tools for screen content videos. SCC employs two additional coding modes, intra block copy (IBC) mode and palette (PLT) mode for intra-prediction. However, the exhaustive mode searching makes the computational complexity of SCC increase dramatically. Therefore, in this thesis, some novel machine learning based techniques are suggested to simplify both encoding and transcoding of SCC. A fast intra-prediction algorithm for SCC by content analysis and dynamic thresholding is firstly proposed. A scene change detection method is adopted to obtain a learning frame in each scene, and the learning frame is encoded by the original SCC encoder to collect learning statistics. The prediction models are tailor-made for the following frames in the same scene according to the video content and QP of the learning frame. Simulation results show that the proposed scheme can achieve remarkable complexity reduction while preserving the coded video quality. Afterwards, we propose a decision tree based framework for fast intra mode decision by investigating various features in training sets. To avoid the exhaustive mode searching process, a framework with a sequential arrangement of decision trees is proposed to check each mode separately by inserting a classifier before checking a mode. As compared with the previous approaches that both IBC and PLT modes are checked for SCBs, the proposed coding framework is more flexible which facilitates either IBC or PLT mode to be checked for SCBs such that computational complexity is further reduced. Simulation results show that the proposed scheme can provide significant complexity saving with negligible loss of coded video quality. To avoid the necessity of hand-crafted features, a deep learning based fast prediction network DeepSCC is then proposed by using convolutional neural network (CNN), which contains two parts, DeepSCC-I and DeepSCC-II. Before fed to DeepSCC, incoming coding units (CUs) are divided into two categories: dynamic coding tree units (CTUs) and stationary CTUs. For dynamic CTUs with different content as their collocated CTUs, DeepSCC-I takes raw sample values as the input to make fast predictions. For stationary CTUs with the same content as their collocated CTUs, DeepSCC-II additionally utilizes the optimal mode maps of the stationary CTU to further reduce the computational complexity. Simulation results show that the proposed scheme further improves the complexity reduction. Finally, we propose a fast HEVC to SCC transcoder. To migrate the legacy screen content videos from HEVC to SCC to improve the coding efficiency, a fast transcoding framework is proposed by analyzing various features from 4 categories. They are the features from the HEVC decoder, static features, dynamic features, and spatial features. First, the CU depth level collected from the HEVC decoder is utilized to early terminate the CU partition in SCC. Second, a flexible encoding structure is proposed to make early mode decisions with the help of various features. Simulation results show that the proposed scheme dramatically shortens the transcoding time.	-
dcterms.accessRights	open access	-
dcterms.educationLevel	Ph.D.	-
dcterms.extent	xvi, 145 pages : color illustrations	-
dcterms.issued	2019	-
dcterms.LCSH	Hong Kong Polytechnic University -- Dissertations	-
dcterms.LCSH	Digital video	-
dcterms.LCSH	Coding theory	-
dcterms.LCSH	Video compression	-
dcterms.LCSH	Machine learning	-
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/10214

Show simple item record

Page views

155

Last Week
0

Last month

Citations as of Jun 22, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM