Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/2765
Title: Compiler-assisted high performance and low power optimizations for embedded systems
Authors: Wang, Meng
Keywords: Hong Kong Polytechnic University -- Dissertations
Embedded computer systems -- Programming
Compiling (Electronic computers)
Issue Date: 2009
Publisher: The Hong Kong Polytechnic University
Abstract: Embedded systems are used in a wide spectrum of applications, ranging from mobile consumer electronics to vehicle controllers. These systems are application specific, and have strict timing and power constraints. Designing high performance and low power embedded systems with various constraints and limited resources has become an important research problem. In this thesis, we investigate the challenging issues in designing compiler-assisted techniques for solving high performance and low power optimization problems in embedded systems. In high performance optimization, we focus on reducing the number of memory accesses for embedded systems. The memory access significantly limits the performance of embedded systems due to the widening processor-memory gap. Besides performance, memory accesses consume a large fraction of overall power consumption. With the emergence of memory-intensive embedded applications, effective optimization techniques are required to reduce the number of memory accesses. In this thesis, we make the following original contributions in this field. - First, we develop a general compiler optimization technique called "REALM" to reduce the number of memory accesses for Digital Signal Processing (DSP) applications with loops. In the loop kernels of DSP applications, one important characteristic is that the same memory location is repeatedly accessed by different memory operations over multiple loop iterations. For DSP applications, therefore, an important problem is how to explore redundant memory accesses and eliminate them by exploiting the desired value across iterations. We solve this problem by replacing redundant memory operations with register operations. The results show that our technique can effectively reduce the number of memory accesses and improve performance compared with previous approaches. - Second, as embedded systems have a limited number of registers, we propose a register allocation and instruction scheduling technique to improve the "REALM" technique with register constraints. For the register operations generated by the "REALM" technique, we analyze their data dependencies for instruction scheduling, and build up a register-matching graph model to find available physical registers that can be allocated to the operands of the register operations. The register allocation problem is solved by finding a simple path of fixed length between two specified vertices in the register-matching graph. We perform instruction scheduling based on the results of the allocation. In low power optimization, we address two challenging issues, leakage and temperature, for embedded systems. Leakage power has become an issue comparable in importance to dynamic power as semi-conductor technologies move down to the nanometer scale. Besides leakage power, temperature issues are also important because both on-chip power density and temperature are rising exponentially with decreasing feature sizes. The increase in on-chip temperature can lead to severe problems with reliability, performance, and cooling costs for embedded systems. To address these issues, we make the following contributions. - The first contribution is to reduce the leakage power consumption of VLIW (Very Long Instruction Word) processors. We propose a novel leakage-aware modulo scheduling technique that helps hardware-based leakage control schemes to achieve leakage power savings for embedded VLIW processors. We also consider transition time and power overhead in our technique, and discuss the trade-off between leakage savings and performance penalties. - The second contribution is to reduce the peak temperature of the on-chip memory subsystem. Most embedded systems adopt a hybrid memory architecture, which contains both hardware-managed cache and software-managed scratchpad memory (SPM). However, both cache and SPM have become hot spots, as they are the most frequently accessed on-chip components. We propose a temperature-aware data allocation technique to explore such a hybrid architecture to jointly optimize performance and peak temperature. Our technique can greatly alleviate the temperature hot spots of the memory subsystem by adaptively distributing the workload between cache and SPM.
Description: xvii, 164 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2009 Wang
URI: http://hdl.handle.net/10397/2765
Rights: All rights reserved.
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
b23429926_link.htmFor PolyU Users 162 BHTMLView/Open
b23429926_ir.pdfFor All Users (Non-printable) 1.71 MBAdobe PDFView/Open
Show full item record

Page view(s)

500
Last Week
2
Last month
Checked on Apr 23, 2017

Download(s)

415
Checked on Apr 23, 2017

Google ScholarTM

Check



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.