applying control theory to the caches of multiprocessors department of eecs university of tennessee,...

39
Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

Upload: josephine-may

Post on 16-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

Applying Control Theory to the Caches of Multiprocessors

Department of EECSUniversity of Tennessee, Knoxville

Kai Ma

Page 2: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

2

Applying Control Theory to the Caches of Multiprocessors

Shared L2 cache is one of the most important on-chip shared resource. Largest area and leakage power consumer One of the dominant players in terms of performance

Two Papers: Relative Cache Latency Control for Performance Differentiations in

Power-Constrained Chip Multiprocessors SHARP Control: Controlled Shared Cache Management in Chip

Multiprocessors

Page 3: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

Relative Cache Latency Control for Performance Differentiations in Power-

Constrained Chip Multiprocessors

Department of EECSUniversity of Tennessee, Knoxville

Xiaorui Wang, Kai Ma, Yefu Wang

Page 4: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

4

Background

NUCA (Non Uniform Cache Architecture)

Key idea: Different cache banks have different access latencies.

13

Page 5: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

5

Introduction The power of the cache part needs to be constrained.

With controlled power, the performance of the caches also need to be guaranteed. Why control relative latency (the ratio between the average

cache access latencies of two threads)?

1. Accelerate critical threads 2. Reduce priority inversion

Page 6: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

6

System Design

Thread 1 on core 1

Thread 0 on core 0

Thread 3 on core 3

Latency Monitor

Thread 2 on core 2

Relative Latency Controller

Cache Resizing and Partitioning Modulator

Power Monitor

Power Controller

Latency Monitor Latency Monitor

Latency Monitor

Relative Latency Controller

Relative Latency Controller

Shared L2 Cache

Relative Latency Control Loop

Power Control Loop

Cache bank of Thread 0

Cache bank of Thread 2

Cache bank of Thread 3

Cache bank of Thread 1

Inactive cache bank

Page 7: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

7

Relative Latency Controller (RLC)

New cache ratio RLRLC

Relative latency set point

• PI (Proportional-Integral) controller System modeling Controller design Control analysis

1.5

Error: 0.3Increase 0.2

Workload variation Total cache size variation

1.5

Shared L2 caches

1.2

Page 8: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

8

21

11

)()()(n

jij

n

jji jkcbjklakl

Relative Latency Model

is the relative latency between and core is the cache size ratio between and core

RL model

System identification Model orders Parameters

21,nn

ii ba ,0.25 0.17 0.17

0.22 0.17 0.17

0.18 0.15 0.15

01 n 11 n 21 n

12 n

22 n

32 n

Model Orders and Error

)(klithi thi )1(

thi thi )1( ic

Page 9: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

9

Controller Design PID controller

Proportional Integral

Design: Root Locus

New cache ratio Relative latencyRelative Latency

set point

Error

)(ke )(

)(

keK

keK

I

P

)1()()1()( 21 keKkeKkckc ii

Shared L2 caches

Page 10: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

10

Control Analysis

Derive the transfer function of the controller

Derive the transfer function of the system with system model variations

Derive the transfer function of the close-loop system and compute the poles

The control period of the power control loop is selected to be longer than the settling time of the relative latency control loop.

)1()1(')( 11 kcbklakl ii

Stability range:

18.1'69.0 1 a

Page 11: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

11

Power Controller is the total cache size in the power control period. is the cache power in the power control period. are the parameters depended on applications

System Model Leakage power is proportional to the cache size. Leakage power counts for the largest portion of cache

power.

PI Controller

Controller analysis: and

( ) * ( )p k c s k d

( )p k( )s k thk

thk,c d

0'c 76.0' c

Page 12: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

12

Simulation Simulator

Simplescalar with NUCA cache (Alpha 21264 like core)

Power reading Dynamic part: Wattch (with CACTI) Leakage part: Hotleakage

Workload Selected workloads from SPEC2000

Actuator Cache bank resizing and partitioning

3

7

11

15

1

4

8

12

2

5

9

13

6

10

14

16

3

7

11

15

1

4

8

12

2

5

9

13

6

10

14

16

3

7

11

15

1

4

8

12

2

5

9

13

6

10

14

16

3

7

11

15

1

4

8

12

2

5

9

13

6

10

14

16

Page 13: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

13

Single Control Evaluation

Switch workloads here

RLC set point change Power controller set point change

Workload switch Total cache bank count change

Page 14: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

14

Relative Latency & IPC

Page 15: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

15

Coordination

Cache access latencies and IPC values of the four threads on the four cores of the CMP.

Cache access latencies and IPC values of the two threads on Core 0 and Core 1 for different benchmarks.

Page 16: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

16

Conclusions Relative Cache Latency Control for Performance

Differentiations in Power-Constrained Chip Multiprocessors

Simultaneously control power and relative latency

Achieve desired performance differentiations

Theoretically analyze the single loop control and coordinated system stability

Page 17: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

SHARP Control: Controlled Shared Cache Management in Chip Multiprocessors

Shekhar Srikantaiah, Mahmut Kandemir, *Qian Wang

Department of CSE

*Department of MNE

The Pennsylvania State University

Page 18: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

18

Introduction Lack of control over shared on-chip resource

Faded performance isolation Lack of Quality of Service (QoS) guarantee

It is challenging to achieve high utilization meanwhile guaranteeing the QoS. Static/dynamic resource reservations may lead to low

resource utilization. Existing heuristics adjustment cannot provide theoretical

guarantee like “settling time” or “stability range”.

Page 19: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

19

Contribution Two-layer control theory based SHARP (SHAred

Resource Partitioning) architecture Propose an empirical model Design a customized application controller (Reinforced

Oscillation Resistant controller) Study two policies can be used in SHARP

SD (Service Differentiation) FSI (Fair Speedup Improvement)

( )

1 ( )

i

i

Napp base

i app scheme

NFS

IPC

IPC

Page 20: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

20

System Design

Page 21: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

21

Why not PID? Disadvantages of PID (Proportional-Integral-

Derivative) controller Painstaking to tune the parameters Hard to be integrated with hierarchical architecture Sensitive to model variation during run time Static parameters Generic controller (not problem-specific) Linear model based controller

Page 22: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

22

Application Controller

Page 23: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

23

Pre-Actuation Negotiator (PAN) Map an overly demanded cache partition to a

feasible partition

Policies:

SD (Service Differentiation )

FSI (Fair Speedup Improvement )

))1((

0

*

N

ii

ii

w

spillwwfloorw

N

ii Wwspillw

0

Page 24: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

24

SHARP Controller Increase IPC set points when cache ways are under

utilized

FSI & SD policies

The proof of guaranteed optimal utilization

N

j jrefjout

j

j

N

j jrefii

PtP

tw

WPtP

0

*

0*

))1(

)1((

)(

Page 25: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

25

Experimental Setup Simulator : Simics (Full system simulator)

Operating System: Solaris 10

Configuration (2, 8 cores)

Workload: 6 mixes of applications selected from SPEC2000

Page 26: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

26

Evaluation (Application Controller)

Long run results of PID controller and ROR controller

Page 27: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

27

Evaluation (FSI)

SHARP vs Baselines

Page 28: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

28

Evaluation (SD)

Adaptation of IPC with the SD policy using the ROR controllers.

Page 29: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

29

Sensitivity & Scalability

Sensitivity analysis for different reference points

Scalability (8 cores)

Page 30: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

30

Conclusion SHARP Control: Controlled Shared Cache

Management in Chip Multiprocessor Propose and design the SHARP control architecture for

shared L2 caches Validate SHARP with different management policies (FSI or

SD) Achieve desired FS and SD specifications

Page 31: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

31

Critiques (1)

How to decide the relative latency set point?

For accelerating critical thread purpose, the parallel workloads may be more applicable.

Page 32: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

32

Critiques (2)

No stability proof

Insufficient description about how to update the parameters for the application controllers

Page 33: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

33

ComparisonRelative latency control with the power constraint

SHARP control architecture

Goal Guarantee NUCA L2 cache relative latency with different power budget

Improve the normal L2 cache utilization while guaranteeing the QoS metrics

Design Two-layer hierarchical design

Two-layer hierarchical design

Controller PID ROR

Coordination & Stability Yes No

Actuator Cache bank resizing and partitioning

Cache way resizing and partitioning

Evaluation Simplescalar Simics

Page 34: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

34

Q & A

Thank you

Page 35: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

35

Backup Slides Start

Page 36: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

36

Relative Controller Evaluation (2)

Page 37: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

37

Application Controller Evaluation (2)

Page 38: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

38

Guaranteed Optimal Utilization Proof are time varying coefficient depended on applications,i iK

*

*

0

*

0

*

**

0

( ) ( )

( 1) ( )

( 1)

( 1)( )

( )

( )

( 1)( )

( )( )

i i i

refi i i

N

i ii

refi i N

refi i

i

ii out

j

refi i N

refiiout

i i

w t P t

P t P K t

P t W

WP t P

P

w t

P t

WP t P

w tP

P t

Page 39: Applying Control Theory to the Caches of Multiprocessors Department of EECS University of Tennessee, Knoxville Kai Ma

39

System Design