Scalable video and audio techniques for video conferencing

Fung, Kai-tat

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/85136

Title:	Scalable video and audio techniques for video conferencing
Authors:	Fung, Kai-tat
Degree:	M.Phil.
Issue Date:	2001
Abstract:	With the advance of video and audio compression and networking technologies, networked multimedia services, such as multipoint video conferencing, video on demand and digital TV, are emerging. We envision a central server (MCU) that may have to support quality of service to heterogeneous clients or transmission channels and it is in this scenario that this server has the capability to perform transcoding in video and audio mixing. In video transcoding, the conventional approach needs to decode the incoming video bitstream in the pixel domain, and the decoded video frame is re-encoded at the desired output bitrate according to the capability of the clients' devices and the available bandwidth of the network. This involves high processing complexity, memory, delay and video degradation. In the audio mixing, the audio signal is usually distorted by the background noise from other channels and makes the speech signal quality degraded. The aim of this study is to find ways that can reduce the computational complexity and provide good quality of video and audio in the video conferencing. In this thesis, we focus on four major aspects of a video conferencing system. They are the video transcoding in multipoint video conferencing, the wavelet based video coder, speech recovery and audio coding. The first half of the thesis is concerned with the video processing while the second half is concerned with the audio processing. In the first half of the thesis, a new frame skipping transcoder is proposed to greatly reduce the computational complexity and reduce the quality degradation. The proposed architecture is mainly performed on the discrete cosine transform (DCT) domain to achieve a low complexity transcoder. It is observed that the re-encoding error is significantly reduced at the frame-skipping transcoder when the strategy of a direct summation of DCT coefficients is employed. By using the proposed frame-skipping transcoder, the video qualities of the active sub-sequences can be improved significantly. Besides, most video conferencing systems use DCT-based encoders. However, under low bit rates, a DCT-based encoder exhibits visually annoying blocking artifacts. Recently, wavelets have been used in internet applications. The major advantage of using a wavelet is its high quality and the absence of blocking artifacts when compared to the conventional video encoder. Although a wavelet-based coder can achieve a good quality, its computational speed is an area of concern. Motivated by this, a new region-based video coder architecture is proposed to achieve a good video quality with a low complexity. The proposed video coder is based on the adaptive region-based updating technique by which the video is updated according to the motion activity. A simple and fast object tracking technique is proposed to locate the region of interest. Features of the proposal includes (i) a user-specified region of interest selection as to which the region can be changed by the user at any time instance and (ii) an adaptive bit allocation that allows the user to specify the relative quality between the foreground and the background to increase the interactivity. This architecture guarantees a high video quality in the region of interest while reducing the overall bit rate and the computation time even under low bit rates. In the second half of the thesis, we address a problem of speech enhancement, which is to recover a speech source from a mixture of its delayed versions and additive noise. By using the constrained optimisation technique, an algorithm based on the second order statistics is developed. The new proposed algorithm requires no strong limitations to the speech signal and the noise. Simulation results show that our algorithm achieves a better performance as compared to other algorithms. Finally, although the MPEG Audio provides the perceptual lossless audio compression, the demanded bitrate and the computational complexity are higher than the conventional speech coding approach. Motivated by this, a fast bit allocation algorithm for the MPEG audio encoder is proposed, which is able to generate an identical MPEG bitstream produced by the standard bit allocation algorithm described in MPEG audio standard. The proposed algorithm employs the bit allocation information of the previous frame as a reference for allocating the restricted bits to each of the 32 subbands in the current frame such that the number of iterations can be significantly reduced. Results of the study show that the performance of the proposed bit allocation algorithm works well at different encoded bitrates. It is exciting to report in this thesis that significant gains in terms of computation and scalability can be achieved by employing our adaptive approaches. Undoubtedly, these adaptive techniques can enable the video conferencing to become more scaleable and provide good quality video and audio in practical situations.
Subjects:	Videoconferencing Hong Kong Polytechnic University -- Dissertations
Pages:	xiv, 110 leaves : ill. ; 30 cm
Appears in Collections:	Thesis

Access

View full-text via https://theses.lib.polyu.edu.hk/handle/200/4062

Show full item record

Page views

165

Last Week
1

Last month

Citations as of May 4, 2025

Google Scholar^TM

Check

Access

Page views

Google ScholarTM

Google Scholar^TM