priority based fair scheduling: a memory scheduler design for chip-multiprocessor systems

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems

Tsinghua UniversityTsinghua National Laboratory for Information Science and Technology

2

Background

• “Memory-wall”– High memory access latency

• DRAM Structure– Channel, Rank, Bank, Row, Column …

– Various timing constraint

• Challenge of multi-core– High parallelism

– More data contention

• Solution– More memory channels

– Efficient memory scheduler

3

Motivation

• Threads classification [TCM:Kim:2008]– Latency-sensitive threads– Bandwidth-sensitive threads

• A memory scheduler should– Improve system throughput– Avoid starvation– Keep fair among different threads

4

Goals

• Requests of latency-sensitive threads– To be issued ASAP

• Requests of bandwidth-sensitive threads– Avoid unfairness

• Our proposal: PBFS– Prioritize latency-sensitive threads– Avoid starvation of bandwidth-sensitive threads

5

Basic Idea

• Each thread gets a priority– Range from -1 to n

• Top-priority (n)– latency sensitive threads

• Bottom-priority (0)– intermediate threads

• Medium-priority (1,n-1)– latency sensitive threads

• Idle (-1)– finished threads or compute-intensive threads

Priority Updating Rules

• Dynamically update– Once a request is issued

• The corresponding thread priority - 1

– When there no thread has top-priority• All thread’s priorities +1

– When a time threshold is arrived• Identify Idle threads, • Adjust top-priority

– Extremely unbalance: increase top-priority– Extremely balance: decrease top-priority– Other case: unchanged– Upper/lower boundaries are adjusted by active threads

6

System throughput

• Latency-sensitive threads – Easy to get top-priority– Issued as soon as possible

• Example– 2-core CMP

• Thread A, latency-sensitive• Thread B, bandwidth-sensitive• Top-priority = 2• Init, both threads’ priorities are 2

7

Example

8

Rq 0

Rq 0

Rq 1

Rq 2

Rq 3

Rq 5

Rq 6

Rq 7

Rq 8

Rq 1

Rq 0

Rq 0

Rq 1

Rq 2

Rq 3

Rq 5

Rq 6

Rq 7

Rq 8

Rq 1

Rq 4

Rq 9

Rq 4

Rq 9

0 1 2 3 4 5 6 7 8 9 10 11

2 2 2 1 2 2 2 2 1 2 2 2

1 0 0 0 0 0 0 0 0 0 0 0

Thread A

Thread B

ExecutionMem. Cycle

Priority A

Priority B

Starvation Avoidance

• When a thread continuously issued too many requests– It will be classified as bandwidth-sensitive thread

– Other threads may have more chance to promote their priorities

• Example– 2-core CMP

• Thread A, less bandwidth-sensitive

• Thread B, bandwidth-sensitive

• Top-priority = 2

• Init, both threads’ priorities are 2

9

Example

10

Rq 0

Rq 0

Rq 1

Rq 2

Rq 3

Rq 5

Rq 6

Rq 7

Rq 8

Rq 1

Rq 0

Rq 0

Rq 1

Rq 2

Rq 3

Rq 2

Rq 6

Rq 7

Rq 8

Rq 3

Rq 4

Rq 9

Rq 4

Rq 9

Rq 2

Rq 4

Rq 1

Rq 3

Rq 5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

2 2 2 1 1 2 1 2 1 2 1 2 1 2 2 2

1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0

Rq 4

Rq 5

Rq 5

Thread A

Thread B

ExecutionMem. Cycle

Priorit y A

Priority B

Hardware overhead

• Need hardware support to– record the priority of each thread

– monitor the threads’ behavior (read counts within a time interval)

– maintain the flags that whether a row buffer can close

• The storage overhead is small and easy to implement

11

Evaluation

• Usimm-1.3• Memory configuration

– 1 channel

– 4 channel

• Benchmarks• Metrics

– Execution time

– Maximum slowdown

– EDP

12

Execution Time

• Overall– CLOSE: 4.2% reduction– PBFS: 7.5% reduction

13

Maximum Slowdown


14

EDP

15


Summary

• We proposed PBFS– Classify threads with priority– Dynamically update threads’ priorities– Guarantee system throughput– Avoid starvation of bandwidth-sensitive threads– Low hardware overhead

16

Thanks

17

priority based fair scheduling: a memory scheduler design for chip-multiprocessor systems

Documents