topics on compilers – spring semester...
TRANSCRIPT
![Page 1: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/1.jpg)
Topics on Compilers – Spring Semester 2011Christine Wagner – 2011/06/08
![Page 2: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/2.jpg)
Introduction
Modulo Scheduling Challenges
Core Concepts
Implementation
Experimental Results
Conclusion
2011/06/08 2Edge-centric Scheduling
![Page 3: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/3.jpg)
Embedded computing systems in today’s portable devices
demand high performance and energy efficiency
Traditional application specific hardware: ASICs
Different functionalities on a single device (voice/data
communication, high definition video, digital photography)
High non-recurring costs for designing ASICs
Programmable hardware solutions
2011/06/08 3Edge-centric Scheduling
![Page 4: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/4.jpg)
Coarse-Grained Reconfigurable Architectures (CGRA)
offer high computation throughput, scalability, low cost and
energy efficiency
consist of an array of FU and register files often organized as a
two dimensional grid
need a compiler to efficiently map implementations of
compute intensive loops onto the array and to exploit all
available resources
2011/06/08 4Edge-centric Scheduling
![Page 5: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/5.jpg)
Challenge: sparse connectivity and distributed register files
Values must be explicitly routed between producing and
consuming operations
No dedicated routing resources
FU serves either as compute resource or as routing resource
Approach of this paper: edge-centric modulo scheduling
2011/06/08 5Edge-centric Scheduling
![Page 6: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/6.jpg)
Modulo Scheduling exposes parallelism by overlapping
successive iterations of a loop
Goal: Find a valid schedule with minimal initiation interval (II)
Factors that complicate CGRA scheduling:
1. Explicit routing
VLIW: routing implicitly guaranteed by storing inter-
mediate values in a multi-ported, centralized register file
CGRA: sparse connectivity and distributed register files
2011/06/08 6Edge-centric Scheduling
![Page 7: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/7.jpg)
2. Intelligent routing
FU for computation and routing
scheduling can easily fail due to poor routing choices
minimizing routing resources
3. Heterogeneous nodes
Inexpensive and expensive nodes
Avoid scheduling inexpensive operations on expensive
nodes
2011/06/08 7Edge-centric Scheduling
![Page 8: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/8.jpg)
4. Modulo constraint
Resources used in periodic fashion as loop kernel
repeats every II cycles
Not possible to guarantee routability by extending the
schedule
schedule can easily fail due to previously scheduled
operations
2011/06/08 8Edge-centric Scheduling
![Page 9: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/9.jpg)
2011/06/08Edge-centric Scheduling 9
CGRA scheduling consists of two tasks:
• Placement of operations into computation slots (FU and time)
• Routing of operands
Node-centric scheduling:
• Operations are placed first and then the routing is done
• Slot by slot is visited until a solution is found
• Scheduler does not consider routing information when placing
operations
![Page 10: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/10.jpg)
Unnecessary visits to empty
slots
Redundant routings
2011/06/08Edge-centric Scheduling 10
![Page 11: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/11.jpg)
Example:
Assumption: C can only be
placed in (4,2) and (2,4)
(3,1): only remaining
memory access slot
Difficult to find the right
slot for placing an operation
2011/06/08Edge-centric Scheduling 11
![Page 12: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/12.jpg)
2011/06/08Edge-centric Scheduling 12
Edge-centric scheduling:
• Operation placement integrated into the routing function
• Scheduler starts with routing the edge instead of placing the operation
up front
• When empty slot is found, scheduler places operation temporarily and
checks if other edges connected to the consumer exist
• If so, those edges are routed recursively
• If this routing fails, the routing resumes from the current slot and not
from the starting slot
![Page 13: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/13.jpg)
Only one routing call is required
Cost assignment to slots to avoid wasting expensive nodes
Faster performance and better results
2011/06/08Edge-centric Scheduling 13
![Page 14: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/14.jpg)
2011/06/08Edge-centric Scheduling 14
Final schedule formed by calling a routing function for each
edge of the DFG
Order in which the router visits each slot determined by a
routing cost assigned to each slot
Two main objectives when routing a single edge:
• Minimizing number of routing resources used
• Proactively avoiding routing failure: avoid using resources that will block
future routes and reserve slots for expensive operations
![Page 15: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/15.jpg)
2011/06/08Edge-centric Scheduling 15
Recurrence edges:
Edges in a recurrence cycle
Schedule them ahead of other operations, especially when II
is close to the length of the recurrence
Edges with the highest priority
![Page 16: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/16.jpg)
2011/06/08Edge-centric Scheduling 16
Simple edges:
Outgoing edge of an operation that has only one consumer
High-fanout edges:
Outgoing edge of an operation with multiple consumers
Priority to simple edges over high-fanout edges
![Page 17: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/17.jpg)
Non-critical and critical edges:
Multiple disjoint paths between two nodes
in the DFG
Dependencies between edges in different paths
Edges on critical path are scheduled first
Example:
Recurrence cycle (5, 6, 8) scheduled first, then 0
2011/06/08Edge-centric Scheduling 17
![Page 18: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/18.jpg)
Non-critical and critical edges:
Multiple disjoint paths between two nodes
in the DFG
Dependencies between edges in different paths
Edges on critical path are scheduled first
Example:
Recurrence cycle (5, 6, 8) scheduled first, then 0
2011/06/08Edge-centric Scheduling 18
Non-critical path
![Page 19: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/19.jpg)
Non-critical and critical edges:
Multiple disjoint paths between two nodes
in the DFG
Dependencies between edges in different paths
Edges on critical path are scheduled first
Example:
Recurrence cycle (5, 6, 8) scheduled first, then 0
2011/06/08Edge-centric Scheduling 19
Non-critical path
Critical path
![Page 20: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/20.jpg)
2011/06/08Edge-centric Scheduling 20
Generation of reduced DFG
• Conversion of DFG into reduced form by collapsing nodes
• Operation is collapsible if inexpensive and has only one producer and
one consumer
• Remove node and draw edge from producer to consumer
• New edge annotated with number of collapsed nodes
Clustering of reduced DFG by ignoring high-fanout edges
Prioritize edges
![Page 21: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/21.jpg)
2011/06/08Edge-centric Scheduling 21
Operation scheduling by calling either placement or routing
function
• Placement function only called if target operation has no placed
producers or consumers
• Routing function: decision which edge to route first
• Decision based on factors like schedule time, state-changeability of
producers or consumers and how many routing options are available
• Forward or backward routing
![Page 22: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/22.jpg)
2011/06/08Edge-centric Scheduling 22
Routing cost calculation
• Routing cost for each available slot
• Used by router to determine the order in which to explore slots
• Three primary components:
1. Static cost: fixed cost assigned to each slot
2. Affinity cost: based on a slot’s distance from placed producers and
given to two operations that have common consumers
3. Probability cost: probability of a slot to be required in the future
![Page 23: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/23.jpg)
2011/06/08Edge-centric Scheduling 23
Finding a target
• After updating all routing costs, router starts finding a path from the
source to the target operation
• Router visits neighboring slots in order of their assigned costs
• When routing collapsed edges, the path goes through at least as many
FUs as the number of collapsed nodes, so that they can be expanded
later without problems
• After slot is found, scheduler checks for other edges connected to the
target and recurses to route those edges
![Page 24: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/24.jpg)
2011/06/08Edge-centric Scheduling 24
After finding a legal schedule, collapsed nodes are expanded
onto the found FU slots
Generation of configuration memories for each component
(e.g. control bits)
If scheduling fails, scheduler increases II and repeats
scheduling
![Page 25: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/25.jpg)
2011/06/08Edge-centric Scheduling 25
Benchmarks: media applications from embedded domain
(H.264 encoder, 3D graphics, AAC decoder, MP3 decoder)
CGRA Architecture: 4x4 heterogeneous array, 4 MEM and 6
MULT FUs, central RF and each FU has its own local RF
Loops with varying size mapped onto different configurations
Comparison with traditional, node-centric and simulated
annealing based modulo scheduling
![Page 26: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/26.jpg)
2011/06/08Edge-centric Scheduling 26
![Page 27: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/27.jpg)
2011/06/08Edge-centric Scheduling 27
Performance improvement of 25% over traditional modulo
scheduling
10-13% increased performance and reduced compile time of
27-46% compared to node-centric scheduling
Simulated annealing most effective strategy, but its high
performance results in slow compile time (EMS: 18x speedup)
EMS showed competitive performance results to simulated
annealing
![Page 28: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/28.jpg)
2011/06/08Edge-centric Scheduling 28
Edge-centric modulo scheduling for CGRAs
Focus on routing process with operation placement as a product
Performance improvement of 25% over traditional modulo scheduling
Reduced compilation time (18x compared to simulated annealing)
Performance heavily depends on characteristics of loop structure and underlying CGRA architecture
![Page 29: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/29.jpg)
Thank you for listening!
Please feel free to ask questions!
2011/06/08 29Edge-centric Scheduling
![Page 30: Topics on Compilers – Spring Semester 2011aces.snu.ac.kr/~bernhard/teaching/4541.775/presentations/... · 2016. 12. 13. · Edge-centric Scheduling 2011/06/08 22 Routing cost calculation](https://reader034.vdocuments.net/reader034/viewer/2022051901/5fefc74a0293770d6c3e9bd5/html5/thumbnails/30.jpg)
Park, H., Fan, K., Mahlke, S., Oh, T., Kim, H., Kim, H.: Edge-centric Modulo
Scheduling for Coarse-Grained Reconfigurable Architectures. Proceedings
of PACT ’08, ACM New York, pp. 166–176.
2011/06/08Edge-centric Scheduling 30