Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/16628
Title: Improved SIMD architecture for high performance video processors
Authors: Lo, WY
Lun, DPK 
Siu, WC 
Wang, W
Song, J
Keywords: Configurable SIMD
parallel memory structure
SIMD bottlenecks
video codec processor
Issue Date: 2011
Publisher: Institute of Electrical and Electronics Engineers
Source: IEEE transactions on circuits and systems for video technology, 2011, v. 21, no. 12, 5734815, p. 1769-1783 How to cite?
Journal: IEEE transactions on circuits and systems for video technology 
Abstract: Single instruction multiple data (SIMD) execution is in no doubt an efficient way to exploit the data level parallelism in image and video applications. However, SIMD execution bottlenecks must be tackled in order to achieve high execution efficiency. We first analyze in this paper the implementation of two major kernel functions of H.264/AVC namely, SATD and subpel interpolation, in conventional SIMD architectures to identify the bottlenecks in traditional approaches. Based on the analysis results, we propose a new SIMD architecture with two novel features: 1) parallel memory structure with variable block size and word length support, and 2) configurable SIMD structure. The proposed parallel memory structure allows great flexibility for programmers to perform data access of different block sizes and different word lengths. The configurable SIMD structure allows almost random register file access and slightly different operations in ALUs inside SIMD. The new features greatly benefit the realization of H.264/AVC kernel functions. For instance, the fractional motion estimation, particularly the half to quarter pixel interpolation, can now be executed with minimal or no additional memory access. When comparing with the conventional SIMD systems, the proposed SIMD architecture can have a further speedup of 2.1X to 4.6X when implementing H.264/AVC kernel functions. Based on Amdahl's law, the overall speedup of H.264/AVC encoding application can be projected to be 2.46X. We expect significant improvement can also be achieved when applying the proposed architecture to other image and video processing applications.
URI: http://hdl.handle.net/10397/16628
ISSN: 1051-8215 (print)
1558-2205 (online)
DOI: 10.1109/TCSVT.2011.2130250
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

SCOPUSTM   
Citations

7
Last Week
0
Last month
0
Citations as of Apr 25, 2017

WEB OF SCIENCETM
Citations

4
Last Week
0
Last month
0
Citations as of Apr 26, 2017

Page view(s)

19
Last Week
0
Last month
Checked on Apr 23, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.