Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/15539
Title: Loop transforming for reducing data alignment on multi-core SIMD processors
Authors: Wang, Y
Pan, L
Shao, Z 
Guan, Y
Guo, M
Keywords: Data alignment
Loop transformation
Multi-processors
SIMD architecture
Issue Date: 2014
Publisher: Springer
Source: Journal of signal processing systems, 2014, v. 74, no. 2, p. 137-150 How to cite?
Journal: Journal of Signal Processing Systems 
Abstract: Multimedia SIMD extensions are commonly employed today to speed up media processing. When performing vectorization for SIMD architectures, one of the major issues is to handle the problem of memory alignment. Prior study focused on either vectorizing loops with all memory references being properly aligned, or introducing extra operations to deal with the misaligned memory references. On the other hand, multi-core SIMD architectures require coarse-grain parallelism. Therefore, it is an important problem to study how to parallelize and vectorize loop nests with the awareness of data misalignments. This paper presents a loop transformation scheme that maximizes the parallelism of outermost loops, while the misaligned memory references in innermost loops are reduced. The basic idea of our technique is to align each level of loops in the nest, considering the constraint of dependence relations. To reduce the data misalignments, we establish a mathematical model with a concept of offset-collection and propose an effective heuristic algorithm. For coarser-grain parallelism, we propose some rules to analyze the outermost loop. When transformations are applied, the inner loops are involved to maximize the parallelism. To avoid introducing more data misalignments, the involved innermost loop is handled from other levels of loops. Experimental results show that 7 % to 37 % (on average 18.4 %) misaligned memory references can be reduced. The simulations on CELL show that 1.1x speedup can be reached by reducing the misaligned data, while 6.14x speedup can be achieved by enhancing the parallelism for multi-core.
URI: http://hdl.handle.net/10397/15539
ISSN: 1939-8018
DOI: 10.1007/s11265-013-0754-2
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

36
Last Week
4
Last month
Checked on Aug 21, 2017

Google ScholarTM

Check

Altmetric



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.