multicore shared memory interference analysis through hardware … · 2020. 3. 13. · luca...

21
GEN-F178-3 (GEN-SCI-029) MULTICORE SHARED MEMORY INTERFERENCE ANALYSIS THROUGH HARDWARE PERFORMANCE COUNTERS Alfonso Mascareñas González Youcef Bouchebaba Luca Santinelli

Upload: others

Post on 17-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

GEN-F178-3 (GEN-SCI-029)

MULTICORE SHARED MEMORY INTERFERENCE ANALYSIS THROUGHHARDWARE PERFORMANCE COUNTERS

Alfonso Mascareñas GonzálezYoucef Bouchebaba

Luca Santinelli

Page 2: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

PLAN

1. Objectives

2. Background

3. Multicore device

4. Measurement framework

5. Task design

6. Statistical application

7. Results

8. Conclusions

2

Page 3: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

OBJECTIVES

• Design and validate a Performance Monitor Hardware measurement based framework

• Analyze memory interference within a multicore system

• Check the pWCET applicability on the obtained results

3

Page 4: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

BACKGROUND

• Critical application: Meet timing conditions

• Single core vs Multicore processor systems

• Multicore systems

+ Throughput

+ SWaP (Size, Weight and Power)

- Predictability: Interference within the whole platform increases

• Timing analysis: Tasks Worst Case Execution Time (WCET) to Tasks probabilistic WCET (pWCET)

4

Memory

Core

Cache

Interconnection

Core 1

Cache

Core 2

Cache

Shared cache

Interconnection

Memory

Page 5: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

MULTICORE DEVICE: OVERVIEW

Keystone II TCI6630K2L

• 2 ARM cores @ 1.2GHz

• 4 DSP cores @ 1.2GHz

• L1, L2 cache memories

• MSM SRAM and DDR3 memories

5

Page 6: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

MULTICORE DEVICE: MEMORY ORGANIZATION

ARM1

32KB L1P 32KB L1D

ARM2

32KB L1P 32KB L1D

1MB L2 DSP3

32KB L1P 32KB L1D

1MB L2

DSP1

32KB L1P 32KB L1D

1MB L2

DSP4

32KB L1P 32KB L1D

1MB L2

DSP2

32KB L1P 32KB L1D

1MB L2

2MB MSM

2GB DDR

6

Page 7: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

MEASUREMENT FRAMEWORK

• Performance Monitor Hardware (PMH):

• Coprocessors

• Performance Monitor Unit (PMU): 6 general counters + 1 cycle specific counter

• Start-read access pattern:

1. Selection of the counter

2. Selection of the event

3. Enable counter

4. Reset counter

5. Read actual counter value (first time)

6. Run critical task

7. Read actual counter value (second time) and make the difference

Events (~ 80)

L1 data cache refill

L1 data cache access

Mispredicted branch speculatively executed

Execution cycles

L2 data cache access

L2 data cache refill

L2 data cache Write-Back

Bus access

Data memory access

7

Page 8: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

TASKS DESIGN

• The real-time applications:

• Critical task: The one under observation. Three stressing levels to choose (safety1, safety2, safety3)

• Non-critical tasks: Act as memory stressing source

LoopsSimple operationsMatrices: Main memory demanding source

Tasks are continuously being executed. They are structured as follows:

▪ Critical task in 1 ARM

▪ Non-critical task in 1 ARM and 4 DSPs

ARMs are managed by PikeOS

DSPs are fully bare metal

8

Page 9: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

STATISTICAL APPLICATION: pWCET & EVT

MBPTA = Measurement-Based Probabilistic Timing Analysis

MBTA = Measurement-Based Timing Analysis

EVT = Extreme Value Theorem

MBTAMeasures EVT

Relative WCET

MBPTA

Hypothesis to fulfill:

1. Stationarity

2. Short or Long range independence

3. Maximum Domain of Attraction (MDA)

Relative pWCET

9

Page 10: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIOS: DESIGN

Four possible scenarios:

1. Critical task analysis

2. Critical task + ARM non-critical task analysis

3. Critical task + DSPs non-critical task analysis

4. Critical task + ARM and DSPs non-critical task analysis

10

ARMCritical

Task

ARMCritical

Task

ARMNon-

criticalTask

Scenario1

Scenario2

ARMCritical

Task

DSPNon-critical

Task

Scenario3DSP

Non-criticalTask

DSPNon-critical

Task

DSPNon-critical

Task

ARMCritical

Task

DSPNon-critical

Task

Scenario4DSP

Non-criticalTask

DSPNon-critical

Task

DSPNon-critical

Task

ARMNon-

criticalTask

Page 11: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 32KB

Memory usage = 128KB

11

38498 L1-L2

L2

139000 cycles

140957

36000 cycles

Page 12: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIO 1 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 512KB

Memory usage = 2MB

12

L2

DDR

542626 cycles

Page 13: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

13

Safety1 Safety3

SCENARIO 1 SUMMARY: EXECUTION CYCLES

Page 14: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIO 2 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 128KB

Memory usage = 2MB

Non-critical task memory usage = 2MB

14

DDR

DDR

Page 15: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

Safety1

Memory Size (KB) Mean Overhead (%) Max Overhead (%)

8 0,185 11,241

32 7,362 13,735

128 21,228 45,112

512 10,72 23,481

2048 4,091 4,363

Non-critical task memory usage = 2MB

15

SCENARIO 2 SUMMARY: EXECUTION CYCLES

Page 16: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 8MB

Memory usage = 8MB

Non-critical task memory usage = 12MB

0 DSPs

1 DSPs16

Page 17: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 8MB

Memory usage = 8MB

Non-critical task memory usage = 12MB

2 DSPs

3 DSPs17

Page 18: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

SCENARIO 3 RESULTS: EXECUTION CYCLES (SAFETY1)

Memory usage = 8MB

Non-critical task memory usage = 12MB

4 DSPs

18

Page 19: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

Safety1

Cores Mean Overhead (%) Max Overhead (%)

ARM 0 0

ARM + 1DSP 0,299 0,363

ARM + 2DSP 0,659 1,105

ARM + 3DSP 1,854 5,769

ARM + 4DSP 4,514 14,991

19

Data caches have been turned off

SCENARIO 3 SUMMARY: EXECUTION CYCLES

Page 20: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

PREDICTABILITY

EVT application to the different scenarios

• Hypothesis check

• Inverse Cumulative Distribution Function (ICDF)

• Pay attention to its convergence

20

Memory usage = 128KB

Page 21: Multicore shared memory interference analysis through hardware … · 2020. 3. 13. · Luca Santinelli. PLAN 1. Objectives 2. Background 3. Multicore device 4. Measurement framework

CONCLUSIONS

• Measurements based on Performance Monitor Hardware successfully works

• The EVT can successfully predict the outcome

• The best placement strategy is:

1. The critical task in one ARM core

2. Non-critical tasks in the DSPs (Resource accessing arbitration may be used if needed)

3. Non-critical tasks in the second ARM (main interference source)

21