multicore shared memory interference analysis through hardware … · 2020. 3. 13. · luca...
TRANSCRIPT
![Page 1: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/1.jpg)
GEN-F178-3 (GEN-SCI-029)
MULTICORE SHARED MEMORY INTERFERENCE ANALYSIS THROUGHHARDWARE PERFORMANCE COUNTERS
Alfonso Mascareñas GonzálezYoucef Bouchebaba
Luca Santinelli
![Page 2: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/2.jpg)
PLAN
1. Objectives
2. Background
3. Multicore device
4. Measurement framework
5. Task design
6. Statistical application
7. Results
8. Conclusions
2
![Page 3: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/3.jpg)
OBJECTIVES
• Design and validate a Performance Monitor Hardware measurement based framework
• Analyze memory interference within a multicore system
• Check the pWCET applicability on the obtained results
3
![Page 4: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/4.jpg)
BACKGROUND
• Critical application: Meet timing conditions
• Single core vs Multicore processor systems
• Multicore systems
+ Throughput
+ SWaP (Size, Weight and Power)
- Predictability: Interference within the whole platform increases
• Timing analysis: Tasks Worst Case Execution Time (WCET) to Tasks probabilistic WCET (pWCET)
4
Memory
Core
Cache
Interconnection
Core 1
Cache
Core 2
Cache
Shared cache
Interconnection
Memory
![Page 5: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/5.jpg)
MULTICORE DEVICE: OVERVIEW
Keystone II TCI6630K2L
• 2 ARM cores @ 1.2GHz
• 4 DSP cores @ 1.2GHz
• L1, L2 cache memories
• MSM SRAM and DDR3 memories
5
![Page 6: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/6.jpg)
MULTICORE DEVICE: MEMORY ORGANIZATION
ARM1
32KB L1P 32KB L1D
ARM2
32KB L1P 32KB L1D
1MB L2 DSP3
32KB L1P 32KB L1D
1MB L2
DSP1
32KB L1P 32KB L1D
1MB L2
DSP4
32KB L1P 32KB L1D
1MB L2
DSP2
32KB L1P 32KB L1D
1MB L2
2MB MSM
2GB DDR
6
![Page 7: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/7.jpg)
MEASUREMENT FRAMEWORK
• Performance Monitor Hardware (PMH):
• Coprocessors
• Performance Monitor Unit (PMU): 6 general counters + 1 cycle specific counter
• Start-read access pattern:
1. Selection of the counter
2. Selection of the event
3. Enable counter
4. Reset counter
5. Read actual counter value (first time)
6. Run critical task
7. Read actual counter value (second time) and make the difference
Events (~ 80)
L1 data cache refill
L1 data cache access
Mispredicted branch speculatively executed
Execution cycles
L2 data cache access
L2 data cache refill
L2 data cache Write-Back
Bus access
Data memory access
…
7
![Page 8: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/8.jpg)
TASKS DESIGN
• The real-time applications:
• Critical task: The one under observation. Three stressing levels to choose (safety1, safety2, safety3)
• Non-critical tasks: Act as memory stressing source
LoopsSimple operationsMatrices: Main memory demanding source
Tasks are continuously being executed. They are structured as follows:
▪ Critical task in 1 ARM
▪ Non-critical task in 1 ARM and 4 DSPs
ARMs are managed by PikeOS
DSPs are fully bare metal
8
![Page 9: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/9.jpg)
STATISTICAL APPLICATION: pWCET & EVT
MBPTA = Measurement-Based Probabilistic Timing Analysis
MBTA = Measurement-Based Timing Analysis
EVT = Extreme Value Theorem
MBTAMeasures EVT
Relative WCET
MBPTA
Hypothesis to fulfill:
1. Stationarity
2. Short or Long range independence
3. Maximum Domain of Attraction (MDA)
Relative pWCET
9
![Page 10: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/10.jpg)
SCENARIOS: DESIGN
Four possible scenarios:
1. Critical task analysis
2. Critical task + ARM non-critical task analysis
3. Critical task + DSPs non-critical task analysis
4. Critical task + ARM and DSPs non-critical task analysis
10
ARMCritical
Task
ARMCritical
Task
ARMNon-
criticalTask
Scenario1
Scenario2
ARMCritical
Task
DSPNon-critical
Task
Scenario3DSP
Non-criticalTask
DSPNon-critical
Task
DSPNon-critical
Task
ARMCritical
Task
DSPNon-critical
Task
Scenario4DSP
Non-criticalTask
DSPNon-critical
Task
DSPNon-critical
Task
ARMNon-
criticalTask
![Page 11: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/11.jpg)
SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1)
Memory usage = 32KB
Memory usage = 128KB
11
38498 L1-L2
L2
139000 cycles
140957
36000 cycles
![Page 12: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/12.jpg)
SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1)
Memory usage = 512KB
Memory usage = 2MB
12
L2
DDR
542626 cycles
![Page 13: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/13.jpg)
13
Safety1 Safety3
SCENARIO 1 SUMMARY: EXECUTION CYCLES
![Page 14: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/14.jpg)
SCENARIO 2 RESULTS: EXECUTION CYCLES (SAFETY1)
Memory usage = 128KB
Memory usage = 2MB
Non-critical task memory usage = 2MB
14
DDR
DDR
![Page 15: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/15.jpg)
Safety1
Memory Size (KB) Mean Overhead (%) Max Overhead (%)
8 0,185 11,241
32 7,362 13,735
128 21,228 45,112
512 10,72 23,481
2048 4,091 4,363
Non-critical task memory usage = 2MB
15
SCENARIO 2 SUMMARY: EXECUTION CYCLES
![Page 16: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/16.jpg)
SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)
Memory usage = 8MB
Memory usage = 8MB
Non-critical task memory usage = 12MB
0 DSPs
1 DSPs16
![Page 17: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/17.jpg)
SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)
Memory usage = 8MB
Memory usage = 8MB
Non-critical task memory usage = 12MB
2 DSPs
3 DSPs17
![Page 18: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/18.jpg)
SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)
Memory usage = 8MB
Non-critical task memory usage = 12MB
4 DSPs
18
![Page 19: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/19.jpg)
Safety1
Cores Mean Overhead (%) Max Overhead (%)
ARM 0 0
ARM + 1DSP 0,299 0,363
ARM + 2DSP 0,659 1,105
ARM + 3DSP 1,854 5,769
ARM + 4DSP 4,514 14,991
19
Data caches have been turned off
SCENARIO 3 SUMMARY: EXECUTION CYCLES
![Page 20: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/20.jpg)
PREDICTABILITY
EVT application to the different scenarios
• Hypothesis check
• Inverse Cumulative Distribution Function (ICDF)
• Pay attention to its convergence
20
Memory usage = 128KB
![Page 21: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework](https://reader035.vdocuments.net/reader035/viewer/2022071420/6119383cf67add51185d777d/html5/thumbnails/21.jpg)
CONCLUSIONS
• Measurements based on Performance Monitor Hardware successfully works
• The EVT can successfully predict the outcome
• The best placement strategy is:
1. The critical task in one ARM core
2. Non-critical tasks in the DSPs (Resource accessing arbitration may be used if needed)
3. Non-critical tasks in the second ARM (main interference source)
21