dynamically specialized datapaths for energy efficient computing
DESCRIPTION
Dynamically Specialized Datapaths for Energy Efficient Computing. Venkatraman Govindaraju , Chen-Han Ho , Karu Sankaralingam Department of Computer Sciences UW-Madison http://www.cs.wisc.edu/vertical. Hardware Improvement. Wedding Cake!. Cupcake!. Pancake!. 1971. 1991. 2011. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/1.jpg)
Dynamically Specialized Datapaths for Energy Efficient Computing
Venkatraman Govindaraju, Chen-Han Ho, Karu Sankaralingam
Department of Computer SciencesUW-Madison
http://www.cs.wisc.edu/vertical1
![Page 2: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/2.jpg)
2
Hardware Improvement
Pancake!
Wedding Cake!
Cupcake!
Not exactly!1971 1991 2011
![Page 3: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/3.jpg)
3
Technology Scaling
Okay, but how is a wedding cake made?
Honey, I shrunk the cooks!
![Page 4: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/4.jpg)
4
The CPU Approachin-order processor
Cupcake!C!
![Page 5: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/5.jpg)
5
The Advanced CPU ApproachOut-of-order, Superscalar
Wedding Cake!
WC!
Do as scheduled!
You mis-predicted!
Two ways at once!
Partial cake from
refrigerator!
Partial cake to
refrigerator!
Load strawberry!
Better performance, but not efficient!
Too many things to do!
![Page 6: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/6.jpg)
6
Hardware Specialization
• We can build a specialized hardware datapath for a certain application
• Will be efficient• Example: GPU for
graphics processing• But,..
“The Wedding Cake Team”
![Page 7: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/7.jpg)
7
Can I get a strawberry pancake?
What are you talking about?
Performance, Efficiency, and Flexibility?
![Page 8: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/8.jpg)
8
Dynamically Specialized Execution Resources : DySER
Dynamically Specialized Execution!
![Page 9: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/9.jpg)
9
Overview
• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?
• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion
![Page 10: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/10.jpg)
10
A Little PeekFetch Decode Execute Memory WriteBack
D$
I$Register
File
Decode ExecUnits
DySER
![Page 11: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/11.jpg)
11
DySER: Summary
Pipe
Shared Cache
DySER
• Heterogeneous array• ≈ 64 KB SRAM area• Up to 10X speedup• An average of 40% energy reduction
![Page 12: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/12.jpg)
12
Dynamically Specialized Execution Resources
• An array of functional units and switches
• A stateless execution unit in processor pipeline– Pipelined– Simple flow control
A B
C
A*B+C
![Page 13: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/13.jpg)
13
Dynamic Specialization
• Capture the pattern between different applications
• The specialized datapath is constructed at the granularity of functional units– Switches for
programmability
![Page 14: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/14.jpg)
14
How DySER Works
• Same DySER block, different pattern
• Simple switch is sufficient– Routers are
energy inefficient• Remove per-
instruction overhead
Specialization Efficiency⇒ Circuit SwitchPacket Switch
![Page 15: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/15.jpg)
15
Slice and Dice
• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?
• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion
![Page 16: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/16.jpg)
16
Identifying The Specialization Target
• Applications are executed in phases– Capture the most
frequent phase
• Identify the phases– Path profiling
• Construct path-treesFind computation? Use DySER!
![Page 17: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/17.jpg)
17
Core DySER
Slicer: A Compiler for the DySER • The instructions in path-
trees are not all computations– Slice the path-tree into a
computation slice and a load slice
• Execute computation slice in DySER
• Execute load-slice in conventional processor pipeline
Slicer
Application
Communication
![Page 18: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/18.jpg)
18
Working Together
• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?
• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion
![Page 19: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/19.jpg)
19
Communication Between The DySER and Processor Core
• DySER interface: ISA extension
bb1: MOV control1 => R2MOV control2 => R3MOV 1 => R4SLL R4, target => R4LD reg->node => R5DYSER_INIT [COMPSLICE]DYSER_SEND R2 => DI1DYSER_SEND R3 => DI2DYSER_SEND R4 => DI3
bb2: DYSER_LOAD [R5+offset(state)] => DM0DYSER_STORE:DO2 DO1, [R5+offset(state)]DYSER_COMMITADD R5, sizeof(node), R5ADDCC R1, -1, R1BNE bb2
Initialize DySERSend input from
register file to DySERSend input
from memory to DySER
Store output from DySER to memory
Commit DySER output to register file
![Page 20: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/20.jpg)
20
Energy Efficient Bakery Is About to Open!
DySER to the rescue!
Integration!
![Page 21: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/21.jpg)
21
Back To Hardware
• Dynamically Specialized Execution• Hardware resource: DySER– How to specialize and be dynamic?
• The compile time support: Slicer• HW/SW interface: ISA extensions• Integration, performance, and conclusion
![Page 22: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/22.jpg)
22
It Is Simple -- Integration
• DySER interface: FIFOFetch Decode Execute Memory WriteBack
D$
I$Register
File
Decode ExecUnits
DySER
![Page 23: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/23.jpg)
23
Out-of-Order Integration
• Out-of-order core integration
• DySER itself maintains no architectural state
• Use buffers to keep the state for speculative execution
![Page 24: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/24.jpg)
24
It Is Good – Evaluation Method
• Simulator: Wisconsin Multifacet GEMS– Benchmarks: SPEC CPU2006, Parboil, and PARSEC– Modified GCC compiler– DySER with 64 functional units
• Speedup & energy reduction– Quantify the low overhead execution on computation
slice– Wattch-based model in GEMS
![Page 25: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/25.jpg)
25
Result - Performance
cp pnssad
blacksch
oles
bodytrack
cannealnamd
soplex
lbm
Geomean1
3
5
7
9
11
1-issue inorder2-issue out-of-order
Spee
dup
![Page 26: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/26.jpg)
26
Result – Energy Reduction
cp pnssad
blacksch
oles
bodytrack
cannealnamd
soplex
lbm
Geomean0
102030405060708090
100
1-issue inorder2-issue out-of-order
Ener
gy R
educ
tion
(%)
![Page 27: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/27.jpg)
27
It is flexible – comparison
• DySER can be SIMD, can do operation-fusion, can accelerate loops– Not enough resources? – The Slicer can help to partition the computational
slice and offload from DySER to processor core• DySER looks like dataflow, but..– No entire new ISA, no routers or packets, no burden
to programmers
![Page 28: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/28.jpg)
28
Conclusion
• Hardware specialization is efficient– Dynamic approach with moderate integration
complexity and few ISA extensions– Up to 10X speedup, ~40% average energy redutcion
• Future work:– FPGA implementation– Comparison with other specialization approaches• FPGA • GPGPU• SSE, AVX
![Page 29: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/29.jpg)
29
Questions?
![Page 30: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/30.jpg)
30
Backup Slides
![Page 31: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/31.jpg)
31
Can This Work?Benchmark Number of pathtrees Pathtrees contribute 90%
execution time
blackscholes 9 3bodytrack 322 9canneal 89 12facesim 906 22
fluidanimate 33 2freqmine 151 31
streamcluster 61 1swaptions 36 6
• We also find: applications re-execute Path-tree several times before moves to next
![Page 32: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/32.jpg)
32
Related work
• Industrial effort
• Generality
• RAW• TRIPS• Wave scalar
• VEAL(ISCA 08)
• scalability
• DySER
• Ambric• Mathstar
![Page 33: Dynamically Specialized Datapaths for Energy Efficient Computing](https://reader035.vdocuments.net/reader035/viewer/2022062301/56816211550346895dd23d35/html5/thumbnails/33.jpg)
DySER Configuration
• Special configure phase– Encode configure information in data, passing through
the existing datapath
33
S1 : L->R
Switch 0: Switch 1:
Not mine This is it!
Switch 1:Left -> Right