priority based fair scheduling: a memory scheduler design for chip-multiprocessor systems
DESCRIPTION
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems. Tsinghua University Tsinghua National Laboratory for Information Science and Technology. Background. “Memory-wall” High memory access latency DRAM Structure Channel, Rank, Bank, Row, Column … - PowerPoint PPT PresentationTRANSCRIPT
Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems
Tsinghua UniversityTsinghua National Laboratory for Information Science and Technology
2
Background
• “Memory-wall”– High memory access latency
• DRAM Structure– Channel, Rank, Bank, Row, Column …
– Various timing constraint
• Challenge of multi-core– High parallelism
– More data contention
• Solution– More memory channels
– Efficient memory scheduler
3
Motivation
• Threads classification [TCM:Kim:2008]– Latency-sensitive threads– Bandwidth-sensitive threads
• A memory scheduler should– Improve system throughput– Avoid starvation– Keep fair among different threads
4
Goals
• Requests of latency-sensitive threads– To be issued ASAP
• Requests of bandwidth-sensitive threads– Avoid unfairness
• Our proposal: PBFS– Prioritize latency-sensitive threads– Avoid starvation of bandwidth-sensitive threads
5
Basic Idea
• Each thread gets a priority– Range from -1 to n
• Top-priority (n)– latency sensitive threads
• Bottom-priority (0)– intermediate threads
• Medium-priority (1,n-1)– latency sensitive threads
• Idle (-1)– finished threads or compute-intensive threads
Priority Updating Rules
• Dynamically update– Once a request is issued
• The corresponding thread priority - 1
– When there no thread has top-priority• All thread’s priorities +1
– When a time threshold is arrived• Identify Idle threads, • Adjust top-priority
– Extremely unbalance: increase top-priority– Extremely balance: decrease top-priority– Other case: unchanged– Upper/lower boundaries are adjusted by active threads
6
System throughput
• Latency-sensitive threads – Easy to get top-priority– Issued as soon as possible
• Example– 2-core CMP
• Thread A, latency-sensitive• Thread B, bandwidth-sensitive• Top-priority = 2• Init, both threads’ priorities are 2
7
Example
8
Rq 0
Rq 0
Rq 1
Rq 2
Rq 3
Rq 5
Rq 6
Rq 7
Rq 8
Rq 1
Rq 0
Rq 0
Rq 1
Rq 2
Rq 3
Rq 5
Rq 6
Rq 7
Rq 8
Rq 1
Rq 4
Rq 9
Rq 4
Rq 9
0 1 2 3 4 5 6 7 8 9 10 11
2 2 2 1 2 2 2 2 1 2 2 2
1 0 0 0 0 0 0 0 0 0 0 0
Thread A
Thread B
ExecutionMem. Cycle
Priority A
Priority B
Starvation Avoidance
• When a thread continuously issued too many requests– It will be classified as bandwidth-sensitive thread
– Other threads may have more chance to promote their priorities
• Example– 2-core CMP
• Thread A, less bandwidth-sensitive
• Thread B, bandwidth-sensitive
• Top-priority = 2
• Init, both threads’ priorities are 2
9
Example
10
Rq 0
Rq 0
Rq 1
Rq 2
Rq 3
Rq 5
Rq 6
Rq 7
Rq 8
Rq 1
Rq 0
Rq 0
Rq 1
Rq 2
Rq 3
Rq 2
Rq 6
Rq 7
Rq 8
Rq 3
Rq 4
Rq 9
Rq 4
Rq 9
Rq 2
Rq 4
Rq 1
Rq 3
Rq 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 2 2 1 1 2 1 2 1 2 1 2 1 2 2 2
1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0
Rq 4
Rq 5
Rq 5
Thread A
Thread B
ExecutionMem. Cycle
Priorit y A
Priority B
Hardware overhead
• Need hardware support to– record the priority of each thread
– monitor the threads’ behavior (read counts within a time interval)
– maintain the flags that whether a row buffer can close
• The storage overhead is small and easy to implement
11
Evaluation
• Usimm-1.3• Memory configuration
– 1 channel
– 4 channel
• Benchmarks• Metrics
– Execution time
– Maximum slowdown
– EDP
12
Execution Time
• Overall– CLOSE: 4.2% reduction– PBFS: 7.5% reduction
13
Maximum Slowdown
• Overall– CLOSE: 4.7% reduction– PBFS: 7.0% reduction
14
EDP
15
• Overall– CLOSE: 9.1% reduction– PBFS: 13.8% reduction
Summary
• We proposed PBFS– Classify threads with priority– Dynamically update threads’ priorities– Guarantee system throughput– Avoid starvation of bandwidth-sensitive threads– Low hardware overhead
16
Thanks
17