amnesiac: amnesic automatic computer€¦ · amnesiac: amnesic automatic computer trading...
TRANSCRIPT
AMNESIAC: Amnesic Automatic ComputerTrading Computation for Communication for Energy Efficiency
Ismail Akturk and Ulya R. KarpuzcuUniversity of Minnesota, Twin Cities
{aktur002, ukarpuzc}@umn.edu
Motivation
Energy of on chip communication normalized to computation
Technology Node 40nm
1.55x 5.77x
10nm
AMNESIAC: Amnesic Automatic Computer
Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., and Glasco, D. “GPUs and the Future of Parallel Computing”, IEEE Micro 31, 5 (2011).
2
Motivation
(Re)computing becomes cheaper than memorizing and retrieving data
AMNESIAC: Amnesic Automatic Computer
Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., and Glasco, D. “GPUs and the Future of Parallel Computing”, IEEE Micro 31, 5 (2011).
3
Energy of on chip communication normalized to computation
Technology Node 40nm
1.55x 5.77x
10nm
Amnesic* Automatic Computer• Idea: Replace energy-hungry load with sequence of instructions
4
*amnesia: noun a partial or total loss of memory (origin late 18th cent.: from Greek amnēsia ‘forgetfulness’)
AMNESIAC: Amnesic Automatic Computer
Amnesic* Automatic Computer• Idea: Replace energy-hungry load with sequence of instructions• Scope
• How to determine which data values to recompute• How to identify instructions to recompute data values• How to protect architectural state during recomputation• How to orchestrate recomputation at runtime
5AMNESIAC: Amnesic Automatic Computer
*amnesia: noun a partial or total loss of memory (origin late 18th cent.: from Greek amnēsia ‘forgetfulness’)
How to determine which data values to recompute
6
core
L1
L2
memory
load A
core
L1
L2
memory
load B
core
L1
L2
memory
load C
A
B
C
Eld,B ? Erc,B Eld,C ? Erc,CEld,A ? Erc,A
AMNESIAC: Amnesic Automatic Computer
Recomputation Slice (RSlice)
7AMNESIAC: Amnesic Automatic Computer
Overview• Amnesic Compiler
• Identifies independent RSlices• Annotates RSlices: RCMP, REC, RTN
8AMNESIAC: Amnesic Automatic Computer
Overview• Amnesic Compiler
• Identifies independent RSlices• Annotates RSlices: RCMP, REC, RTN
• Amnesic Scheduler• Decides when to recompute• Can use various policies
• Compiler• First-Level Cache Miss - FLC• Last-Level Cache Miss - LLC
9AMNESIAC: Amnesic Automatic Computer
Overview• Amnesic Compiler
• Identifies independent RSlices• Annotates RSlices: RCMP, REC, RTN
• Amnesic Scheduler• Decides when to recompute• Can use various policies
• Compiler• First-Level Cache Miss - FLC• Last-Level Cache Miss - LLC
• Amnesic Microarchitecture• Prevents corruption of architectural state• Makes RSlice inputs available at the time of recomputation
10AMNESIAC: Amnesic Automatic Computer
Amnesic Microarchitecture
11
Source and Destination Registers
…
SFile
AMNESIAC: Amnesic Automatic Computer
Amnesic Microarchitecture
12
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
Renamer SFile
AMNESIAC: Amnesic Automatic Computer
Amnesic Microarchitecture
13
Leaf Address Non-Recomputable Inputs
v
…
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
Hist
Renamer SFile
…
v …
AMNESIAC: Amnesic Automatic Computer
Amnesic Microarchitecture
14
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: Default Execution
15
0
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: Default Execution
16
01
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
Fetch Logic
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: During Recomputation
17
01
2
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
Fetch Logic
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: During Recomputation
18
01
2 3
4Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
Fetch Logic
ExecutionUnits
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: During Recomputation
19
01
2 3
4
5
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
Fetch Logic
ExecutionUnits
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: During Recomputation
20
01
2 3
4
5
6
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
Fetch Logic
ExecutionUnits
AMNESIAC: Amnesic Automatic Computer
Putting It All Together: During Recomputation
21
01
2 3
4
5
6
7
8
Leaf Address Non-Recomputable Inputs
v
…
…
RSlice Instructions
…
Source and Destination Registers
…
RegisterFile[#] -> SFile[#]
IBuff Hist
Renamer SFile
…
v …
MemoryorRegisterFile
Fetch Logic
ExecutionUnits
AMNESIAC: Amnesic Automatic Computer
Evaluation Setup
22
• Benchmarks: SPEC2006, NAS, Parsec, Rodinia• 33 benchmarks evaluated, 11 of them show > 10% EDP gain
• Amnesic compiler pass mimicked by binary generator Pintool• Microarchitecture and scheduler implemented in Snipersim
AMNESIAC: Amnesic Automatic Computer
Energy-Delay Product
23AMNESIAC: Amnesic Automatic Computer
mcf sx cg is ca fs fe rt bp bfs sr
Energy-Delay Product
24AMNESIAC: Amnesic Automatic Computer
mcf sx cg is ca fs fe rt bp bfs sr
Energy-Delay Product
25AMNESIAC: Amnesic Automatic Computer
mcf sx cg is ca fs fe rt bp bfs sr
Energy-Delay Product
26AMNESIAC: Amnesic Automatic Computer
mcf sx cg is ca fs fe rt bp bfs sr
mcf sx cg is ca fs fe rt bp bfs sr
Energy-Delay Product
27AMNESIAC: Amnesic Automatic Computer
up to 87% (24.92% on average ) EDP gain obtained
Instruction Mix
28
Benchmark% increase in
instruction count% decrease in load
count
mcf 4.47 6.19
sx 4.55 6.68
cg 3.97 2.11
is 17.97 49.99
ca 7.38 7.95
fs 1.83 3.08
fe 3.55 1.75
rt 1.97 6.08
bp 31.89 55.55
bfs 1.20 60.93
sr 20.02 23.33
AMNESIAC: Amnesic Automatic Computer
Benchmark% increase in
instruction count% decrease in load
count
mcf 4.47 6.19
sx 4.55 6.68
cg 3.97 2.11
is 17.97 49.99
ca 7.38 7.95
fs 1.83 3.08
fe 3.55 1.75
rt 1.97 6.08
bp 31.89 55.55
bfs 1.20 60.93
sr 20.02 23.33
Instruction Mix
29AMNESIAC: Amnesic Automatic Computer
49.99% reduction in loads, at the expense of 17.97% more instructions
Benchmark% increase in
instruction count% decrease in load
count
mcf 4.47 6.19
sx 4.55 6.68
cg 3.97 2.11
is 17.97 49.99
ca 7.38 7.95
fs 1.83 3.08
fe 3.55 1.75
rt 1.97 6.08
bp 31.89 55.55
bfs 1.20 60.93
sr 20.02 23.33
Instruction Mix
30AMNESIAC: Amnesic Automatic Computer
translates into 74.68% reduction in (load) energy
49.99% reduction in loads, at the expense of 17.97% more instructions
RSlice Length
31
sx bfs
AMNESIAC: Amnesic Automatic Computer
RSlice Length
32
sx bfs
AMNESIAC: Amnesic Automatic Computer
majority of RSlices (78.32%) have less than 10 instructions
Summary• Amnesic execution can minimize power and performance overhead of
• Data retrieval• Data communication
• Amnesic execution makes the workload more compute-intensive• Better use of classic architectures
• Up to 87% improvement in energy-delay product (avg. 24%)
33AMNESIAC: Amnesic Automatic Computer
Questions and Comments
Please send your questions and comments to
34AMNESIAC: Amnesic Automatic Computer
AMNESIAC: Amnesic Automatic ComputerTrading Computation for Communication for Energy Efficiency
Ismail Akturk and Ulya R. KarpuzcuUniversity of Minnesota, Twin Cities
{aktur002, ukarpuzc}@umn.edu