cbr: sharing dram with minimum latency and bandwidth guarantees
Post on 31-Dec-2015
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
CBR: Sharing DRAM with Minimum Latency and Bandwidth
Guarantees
Zefu Dai, Mark Jarvin and Jianwen Zhu
University of Toronto
23/4/19 University of Toronto 2
Background Consumer Electronics is part of everyday life!
SoC
Mem Contr.
DRAM
23/4/19 University of Toronto 3
Background A portable media player SoC example
23/4/19 University of Toronto 4
Background A portable media player SoC example
23/4/19 University of Toronto 5
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
23/4/19 University of Toronto 6
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
1000x
23/4/19 University of Toronto 7
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
23/4/19 University of Toronto 8
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
I want the data
NOW!!!
23/4/19 University of Toronto 9
BackgroundA portable media player SoC example
6.4 9.6 1.2 164.8 0.09 31.0 156.7 94MB/s
Give me 10 KB in 1 us,
please.
I want the data
NOW!!!
I can only supply a maximum of 6.4 GB every second.
23/4/19 University of Toronto 10
ChallengesSimultaneously satisfy:
- Bandwidth requirements
- Latency requirements
23/4/19 University of Toronto 11
Previous WorkQoS aware
- Bandwidth or latency is heuristically improved
QoS guaranteed- Guaranteed minimum bandwidth and / or latency
23/4/19 University of Toronto 12
Main IdeasStart with Bandwidth Guaranteed Prioritized
Queuing (BGPQ) algorithm - Bandwidth guarantee
Improve it using Credit Borrow and Repay (CBR) mechanism- Minimum latency guarantee
23/4/19 University of Toronto 13
Bandwidth Guaranteed Prioritized Queuing
Combine both the benefits of the Priority Queuing and Weighted Fair Queuing - Credit based Weighted Fair Queuing
- Prioritized service for residual bandwidth allocation
Residual bandwidth:- The bandwidth assigned to one user that is unused
at a specific point of time
23/4/19 University of Toronto 14
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.0 0.0 0.0
Initial state: everybody has a credit of zero.
Multiplexer
BGPQ Scheduler
23/4/19 University of Toronto 15
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.50.2
0.3
Multiplexer
Step 1: calculate dynamic credit for each queue.
BGPQ Scheduler
23/4/19 University of Toronto 16
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0
0.50.2
0.3
Step 2: turn on switch box and transfer data from granted queue.
BGPQ Scheduler
Multiplexer
23/4/19 University of Toronto 17
BGPQ AlgorithmCase 1: all queues are busy
- No residual bandwidth
- Act as WFQ
Q0
Q1
Q2
Shared Resource
50%
20%
30%
0-0.5
0.20.3
Multiplexer
Step 3: subtract 1 from the credit of granted queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
BGPQ Scheduler
23/4/19 University of Toronto 18
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Before new scheduling cycle:
Q1 is empty.
Priority: Q0>Q1>Q2
BGPQ Scheduler
0-0.5
0.20.3
23/4/19 University of Toronto 19
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Step 1: Calculate a dynamic credit for each queue.
Credit of empty queue remain unchangedPriority: Q0>Q1>Q2
BGPQ Scheduler
00.0 0.2
0.6
23/4/19 University of Toronto 20
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
Shared Resource
50%
20%
30%Multiplexer
Step 2: allocate residual bandwidth to non-empty queue with highest priority.
Priority: Q0>Q1>Q2
BGPQ Scheduler
00.2 0.2
0.6
23/4/19 University of Toronto 21
Shared Resource
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
50%
20%
30%Multiplexer
Step 3: transfer data from granted queue.
Priority: Q0>Q1>Q2
BGPQ Scheduler
00.2 0.2
0.6
23/4/19 University of Toronto 22
Shared Resource
BGPQ AlgorithmCase 2: some queues are empty
- Has residual bandwidth
- Prioritized service on residual bandwidth
Q0
Q1
Q2
50%
20%
30%Multiplexer
Step 4: subtract 1 from the credit of granted queue.
Priority: Q0>Q1>Q2 One Scheduling cycle is Done!!
Sum of credits = 0!
BGPQ Scheduler
00.2 0.2
-0.4
23/4/19 University of Toronto 23
BGPQ AdvantagesBGPQ = WFQ + PQ
- bandwidth guarantee
- prioritized access to residual bandwidth
Low implementation cost:- 3 adders for credit calculation
- 1 comparator tree to find the highest dynamic credit
23/4/19 University of Toronto 24
BGPQ DisadvantageLow latency, low bandwidth requirement
class:- No minimum latency guarantee
Minimum latency:- No need to wait for any request that has lower
priority
23/4/19 University of Toronto 25
Latency Problem of BGPQExample:
Optimal Scheduling:
23/4/19 University of Toronto 26
Credit Borrow and Repay Mechanism
Borrow- Allow low latency requirement class to borrow the
scheduling opportunity from other classes
Repay- Return the credit later when convenient
23/4/19 University of Toronto 27
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0: a borrowed ID FIFO
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Step 1: calculate dynamic credit, and allocate the residual bandwidth
Priority: Q0>Q1>Q2DebtQ
CBR Scheduler
Multiplexer
23/4/19 University of Toronto 28
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 2: re-assign the scheduling opportunity to Q0. And record the borrowed ID.
CBR Scheduler
23/4/19 University of Toronto 29
CBR MechanismCase 3: Credit Borrow and Repay
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
0.7
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 3: transfer data
CBR Scheduler
23/4/19 University of Toronto 30
CBR MechanismCase 3: Credit Borrow
- Maintain a debt queue for Q0
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 4: subtract 1 from original scheduled queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
CBR Scheduler
23/4/19 University of Toronto 31
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Initial state: Q0 is empty but has debt. It will ‘appear’ to be non-empty
CBR Scheduler
23/4/19 University of Toronto 32
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 1: calculate dynamic credits and allocate the residual bandwidth.
CBR Scheduler
23/4/19 University of Toronto 33
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 2: return the scheduling opportunity and clear the DebtQ.
CBR Scheduler
23/4/19 University of Toronto 34
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 3: transfer data.
CBR Scheduler
23/4/19 University of Toronto 35
CBR MechanismCase 4: Credit Repay
- It is time to repay the credit
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0-0.4
0.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
Step 4: subtract 1 from scheduled queue.
One Scheduling cycle is Done!!
Sum of credits = 0!
CBR Scheduler
23/4/19 University of Toronto 36
CBR MechanismMinimum Latency Guarantee using CBR
- No need to wait for requests in other queues
Worst case: Q0 is not empty while DebtQ is full- No minimum latency guarantee under such case
23/4/19 University of Toronto 37
Implementation in FPGACBR MPMC top level diagram
- Instantiation-time configurable port number
- Run-time programmable priority and bandwidth
23/4/19 University of Toronto 38
Implementation in FPGA
Credit calculation circuit
Sorting Network and CBR
23/4/19 University of Toronto 39
Implementation Cost8 port CBR-MPMC with 16-depth DebtQ
- Xilinx Virtex-5 XC5VLX50T
- Speedy DDR backend memory controller
23/4/19 University of Toronto 40
EvaluationSimulation Framework
- Cycle accurate C model of MPMC- Simple close-page DDR memory model - Trace capturing and converting method
23/4/19 University of Toronto 41
EvaluationCPU workload trace file (from B. Jacob)
- Cache simulation on standard SPEC2000 integer benchmark
Irregular and low bandwidth requirement:
0.4 memory transactions per 1k instructions.
23/4/19 University of Toronto 42
EvaluationAccelerator Workload
- ALPBench suite of parallel multimedia applications
23/4/19 University of Toronto 43
EvaluationAccelerator Workload
- ALPBench suite of parallel multimedia applications
Periodically repeated access pattern, high bandwidth requirement:
18.3 memory transactions per 1k instructions.
23/4/19 University of Toronto 44
Results BGPQ Scheduler
- Latency: number of clock cycles- Bandwidth: number of memory transaction per 1k clock cycles
23/4/19 University of Toronto 45
ResultsCBR Scheduler with a 16-depth debtQ
23/4/19 University of Toronto 46
Impact of DebtQ SizeRepay conditions:
- DebtQ is full
- Q0 is empty
Q0
Q1
Q2
Shared Resource
10%
20%
70%
0
0.60.0 0.4
Multiplexer
Priority: Q0>Q1>Q2DebtQ
CBR Scheduler
When DebtQ is full, remaining requests in Q0 will not be served with minimum latency guarantee!
23/4/19 University of Toronto 47
Impact of DebtQ SizeHow big is enough for DebtQ?
- Determined by instant time bandwidth requirement
Irregular access pattern means:- Large range of DebtQ size requirement
Tradeoff- Resource efficiency VS performance
23/4/19 University of Toronto 48
ResultsImpact of debt queue size
23/4/19 University of Toronto 49
ConclusionsCBR scheduler can provide minimum
bandwidth and latency guarantees
Low implementation cost, power consumption
We expect its successful use in a wide range of multimedia applications
23/4/19 University of Toronto 50
Questions?
Q0
Q1
Q2
Shared Resource
10%
20%
70%
00.3 0.0
-0.3
CBR Scheduler
Multiplexer
Priority: Q0>Q1>Q2DebtQ
top related