memory allocation via graph coloring using scratchpad memory tomyo maeshiro [email protected]...
TRANSCRIPT
1
Memory Allocation via Graph Coloring using Scratchpad Memory
Tomyo [email protected]
Software Engineering Language LabUNT
April 29, 2014
2
OUTLINE
Motivation
Problem Statement
Research Question
Research Plan
Tools
3
MOTIVATION
4
Scratchpad Memory
What is Scratchpad Memory?
“Small” fast memory SRAM technology Software controlled.
5
Cache Memory VS Scratchpad Memory
Cache Memory Controlled by hardware Uses hardware to decode tags Most computers include it
Scratchpad Memory Designed for embedded systems Can be compacted in smaller area More energy friendly Completely software controlled
6
Why Scratchpad Memory?
Banakar [1] conducted a study comparing scratchpad and cache.
7
PROBLEM DEFINITION
8
Memory Hierarchy
9
The registers
1st level: The registers
Register allocation
Problem: Registers are too limited. It is unlikely to fit all data in registers.
10
Memory Latency
Memory Latency
“Time taken to transfer a data block from memory to the CPU and vice versa”
11
DRAM vs SRAM
DRAM Cheap Allows high densities High memory latency
“DRAM data rate increased by over 10x, CPU frequency increased by the same factor as well. As a result, latency hasn't really changed.” (Marc Greenberg[2])
SRAM Fast and simple Expensive to manufacture Small memory latency
12
Scratchpad Memory
Scratchpad benefits of the advantages of SRAM technology.
The lack of generalized compiler support for memory allocation is the reason scratchpad has not yet achieved its full potential.
The challenge is to create a robust, flexible compiler technique to exploit scratchpad’s true potential
13
RESEARCH QUESTION
Given sophisticated memory allocation support, how much improvement can scratchpad provide in terms of energy efficiency and overall program time execution compared to that of cache memory?
14
RESEARCH PLAN
15
My approach
Create a dynamic approach combining Memory Coloring [6] and Heap Allocation [8].
Heap, stack and global area allocated via graph coloring.
Not rely on profiled data.
16
Experimentation
Experiments: Scratchpad vs Cache execution times. Scratchpad vs Cache power consumption. Scratchpad compile time penalties.
Benchmarks. Mibench (38 programs). MediaBench (27 programs) Spec
17
TOOLS
18
Validation Framework
19
OVPSim
Open Virtual Platform (OVP) is an open source initiative lead by Imperas.
Targeted to models of embedded components to enable embedded software development.
Three Components: Platform models. Processor models. Peripheral models.
20
FlexA
FlexA analyzes a program's performance obtaining an approximate estimation of the program’s behavior in terms of cycle count and power consumption.
The program is analyzed based upon: Execution time of each instruction. Cache or scratchpad overhead.
21
FlexA
FlexA experimental results:
Program # instructions Cache Cycles SPM cycles Improvement
Sum.c 1,251 8,161 1,259 84.57%
8q.c 787,948 798,666 787,956 1.34%
Mmul.c 5,005,743 10,794,387 5,005,751 53.62%
search large.c 9,839,738 14,309,029 9,839,746 31.23%
basicmath small.c 2,451,355 2,504,794 2,451,363 2.13%
22
REFERENCES
1. Rajeshwari Banakar, Stefan Steinke, Bo-Sik Lee, M. Balakrishnan, and Peter Marwedel. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the tenth international symposium on Hardware/software codesign , CODES '02, pages 73-78, New York, NY, USA, 2002. ACM.
2. Greenberg. Ddr4: Double the speed, double the latency? make sure your system can handle next-generation dram, http://www.chipestimate.com/techtalk.php?d=2011-11-22, November 2011.
3. Keith D. Cooper and Timothy J. Harvey. Compiler-controlled memory. SIGOPS Oper. Syst. Rev. , 32(5):2-11, October 1998.
23
REFERENCES
4. Preeti Ranjan Panda, Nikil D. Dutt, and Alexandru Nicolau. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the 1997 European conference on Design and Test, EDTC '97, pages 7-, Washington, DC, USA, 1997. IEEE Computer Society.
5. Jan Sjodin, Bo Froderberg, and Thomas Lindgren. Allocation of global data objects in on-chip ram. Compiler and Architecture Support for Embedded Computing Systems , pages 205-220, 1998.
6. Lian Li, Hui Feng, and Jingling Xue. Compiler-directed scratchpad memory management via graph coloring. ACM Trans. Archit. Code Optim. , 6(3):9:1-9:17, October 2009.
24
REFERENCES
7. G. J. Chaitin. Register allocation & spilling via graph coloring. In Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction , SIGPLAN '82, pages 98-105, New York, NY, USA, 1982. ACM.
8. Angel Dominguez, Sumesh Udayakumaran, and Rajeev Barua. Heap data allocation to scratch-pad memory in embedded systems. J. Embedded Comput. , 1(4):521-540, December 2005.