from chaos to qos: case studies in cmp resource...

From Chaos to QoS: Case Studies in From Chaos to QoS: Case Studies in

CMP Resource ManagementCMP Resource Management

North Carolina State University

Stanford UniversityIntel Corporation

Hari Kannan, Fei Guo, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, Christos Kozyrakis

2

Outline

• Motivation and Problem Statement

• QoS in Resource Management

– Platform QoS

– Case Study: Cache Resource

• Experiments and Evaluation

– Prototype Implementation

– Initial Results

• Summary

3

Motivation

• More cores are integrated on the die

– Multi-tasking becomes more common: multiple applications are running simultaneously

– Virtualized workloads become mainstream: multiple VMs are consolidated onto the same platform

• Problems in platform resource management

– Loss of efficiency

• Disparate Behavior or Disparate Resource Usage of simultaneously-running applications/VMs

– No fairness or determinism guarantees

• Cache space or memory bandwidth available is non-deterministic

– No prioritization

• Cannot map priority level to platform resource allocation

4

Problem Statement

•Not all applications are equal - Users have preferences– End-users (client) want to treat foreground preferentially– End-users (server) want to provide service differentiation

• Priority-based OS scheduling no longer sufficient– With more cores, OS will allow high and low priority

applications to run simultaneously– Low priority applications will steal platform resources from

high priority apps � loss in performance & user experience

• Platform has no support for application differentiation– No knowledge of user preferences– No support for preferential treatment

5

Example of Resource Sharing Impact

• Choose art as high priority and iperf as low priority

• High priority application suffers 10X more cache misses– Iperf has poor cache locality thus thrashes the cache

• Need for the platform to comprehend the priority of applicationsso that it can allocate hardware resources accordingly

0%

20%

40%

60%

80%

100%

120%

Art (alone) Art+Iperf

Normalized occupancy (%)

0

2

4

6

8

10

12

Normalized MPI

Art-occupancy Art-MPI

Investigate QoS policies and mechanisms to efficiently managethese shared resources in the presence of disparate applications.

6

Outline



– Platform QoS




– Initial Results

• Summary

7

Resource Management

Capitalist

• No management of resources

• If you can generate more requests, you will use more resources

• Grab as you will

• E.g. All of today’s policies

• Fair distribution of resources

• Give equal share of resources to all executing threads

• Does not necessarily guarantee equal performance

• E.g. Partitioning resources for fairness and isolation

Communist/Fair

• Focus on individual efficiency

• Provide more performance and resources to the VIP

• Limited resources to non-VIP

• E.g. Service Level Agreements, Foreground/Background

Elitist

• Focus on overall efficiency

• Give more resource to those that need it the most, less to others

• E.g. Cache-friendly vs. Unfriendly, resource-aware scheduling

Utilitarian

8

Platform QoS

SoftwareDomain

HardwareDomain

HWPolicies

Resource Monitoring

ResourceEnforcement

QoSExposure

Feedback

QoS Hints viaArchitectural Interface

Memory IOCPU Core

Cache

QoS Enabled Resources

uArch resourceusage

Cache Space

Bandwidthand Latency

I/O Response Time

Case Study

9

Platform PrioritySent through PQR

QoS Aware OS/VMM:Platform Priority added

to App state

QoS Exposure:QoS Aware OS/VMMPlatform QoS Register

CacheCacheHi LoHi Lo

LowPriority

OSOS

High Priority

Set Application’sPlatform Priority

IOIO

MemoryMemory

QOS Interface

Application PlatformPriority

App State

App State

App State

App State

Resource Monitoring:

Monitor cache utilization per priority level

- Tag cache requests

- Count usage per priority

Resource Enforcement:Enforce cache utilization for priority levels

- Way Partitioning- Capacity Partitioning

Core 0 Core 1

Platform QOS Register

Requests tagged with Priority

QoS Aware Architecture

Expose QoSInterface

Cache

10

Cache QoS Polices & Metrics

• Static– N priority levels supported in platform

– Set a threshold of cache usage for each priority level

• Dynamic– Monitor cache space usage &

performance metric at frequent intervals

– Dynamically Adjust based on

• QoS Targets: The extent to which high priority application should be improved

• QoS Constraints: The allowable degradation to low priority application or the overall performance

– Metrics

• Resource performance (Miss Rate or MPI, etc)

• Overall performance (IPC, etc)

Performance

High Priority Low Priority Overall

Dedicated Mode

Shared Mode

QoSQoS TargetTarget

Shared Mode

Dedicated ModeDedicated Mode

QoSQoS ConstraintConstraint

High priorityHigh priority

QoSQoS TargetTarget

Low priorityLow priority

QoSQoS ConstraintConstraint

Shared Mode

L2High Priority

Data

Mid Priority Data

L1

CORE 0

L1

CORE 1

APP1(High) APP2(Mid)

L1

CORE 2

APP3(Low)

Low

11

Outline



– Platform QoS




– Initial Results

• Summary

12

Prototype -- QoS aware OS/VMM

•Add QoS bits -- indicate the priority level of the application

•Set QoS register in the platform

– Special I/O instruction during each context switch

•Priority level management

Linux Kernel

Add QoS bits in process’s state data structure

…

Modify OS scheduler to set QoS register

Add sys_getQRand sys_setQRsystem calls as API for users

QoS utility

Add a tool to query/set QoS bits in user

level

App0 AppN…

13

Simulation Framework

• Software Prototype– QoS-enabled OS: Fedora Core 5 Linux– QoS-enabled VMM: Xen with Suse 9.1 Linux

• Full system simulation– Employ SoftSDV to functionally model the architecture– Employ Casper as a cache simulator

• Modified to support monitoring and cache space allocation using static/dynamic QoS policies

SoftSDV

Functional

CPU Model

Performance

Cache Model

QoS-Enabled Linux OS

( Fedora Core 5 )

APP1 APPn….

QoS-Enabled VMM

( Xen )

OS

App1 App2

VM1

OS

App1 App2

VM2

Disk image

APIs

S/WPrototype

H/WSimulation

14

Evaluation Setup

• Simulation configuration

• Applications– QoS aware OS simulation• Spec2000 benchmarks (gcc, ammp, art, applu, mcf, mesa)

– QoS aware VMM simulation• Spec2000 benchmarks (art, swim, mesa, bzip2)• Networking benchmark (Iperf)

1/2/4Cores

Unified, 256/512/2048/4096/8192 KB, 16 way, 64B line

L2 (Shared)

Unified, 32KB, 16 way, 64B line, LRUL1 (Private)

ValuesParameters

15

Static Policy on Two-core CMP

0

0.5

1

1.5

2

2.5

100% 40% 30% 20% 10%

Cache Space that Ammp can consume

Normalized MPI

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Average Cache Occupancy (%)

Gcc-space Ammp-spaceGcc-MPI Ammp-MPI

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

100% 40% 30% 20% 10%

Cache Space that Gcc can consume

Normalized MPI

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


Ammp-space Gcc-spaceAmmp-MPI Gcc-MPI

– As we reduce the cache space available for ammp, MPI for gcc is reduced

– Gcc benefits more from cache QoS than ammp

• Probably because gcc is more sensitive to the cache size around this size range

– Low priority application exceeds its threshold

• Sharing between applications

• Capacity partitioning

– Try to find a victim of low priority if it exceeds its threshold

– Replace a high priority cache line if set is full of high priority cache lines

512K 512K

16

Static Policy on Four-core CMP

Set threshold for high, mid and low as 100%, 10%, 10%, 0%

0

0.5

1

1.5

2

2.5

3

3.5

4

Applu Art Gcc Mcf

Normalized MPI

Shared

Prioritized

High

%

Mid Mid Low

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Time interval

Cache Occupancy (%)

applu art gcc mcf

AVE

0%

20%

40%

60%

80%

100%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Time interval

Cache Occupancy (%)

applu art gcc mcf

AVE

– MPI for applu is reduced by 33%, MPI for art, gcc and mcfincreases by 21%, 150% and 216% respectively

– Employing cache QoS can efficiently assign a deterministic amount of cache space to various applications.

1M

17

Static Policy on Two-VM CMP

•Iperf (low pri) in Xen’s Domain 0, art (high pri) in Domain 1

•4M cache– MPI for art is reduced significantly

•2M cache– MPI for art is reduced lineally

– This is at the cost of iperf performance

– The overall MPI increases significantly when iperf is limited to 10% of the 2M cache --> resource management of small caches could adversely affect the system’s performance if the lower priority application is not allocated a minimum amount of cache space

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

100% 40% 30% 20% 10%

Cache Space that Iperf can consume

Normalized MPII

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


Art-occupancy Iperf-occupancyArt-MPI Iperf-MPI

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

100% 40% 30% 20% 10%

Cache Space that Ipe rf can consume

Normalized MPII

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%


Art-occupancy Iperf-occupancyArt-MPI Iperf-MPI

4M 2M

18

Static Policy on Four-VM CMP

– Set threshold for high, mid, low as 100%, 20%, 20%, 10%

– MPI for art is reduced by 45%

– MPI for mesa is reduced by 10% because of the decreased interference from iperf

– Bzip2 gets some degradation

– Iperf sees more than 2x degradation

0

0.5

1

1.5

2

2.5

Art Mesa Bzip2 Iperf

Normalized MPI Shared

Prioritized

High LowMidMid

4M

19

Comparison of Various Policies

– art as high priority and iperf as low priority

– Shared mode: without prioritization

– Fair mode: each application occupies half of the cache

– Static QoS mode: set threshold for iperf as 10%

– Dynamic QoS mode: QoS target, MPI = 0.5 of the shared mode

– Dedicated mode: application occupies the whole cache

Static and dynamic QoS are efficient to approach the performanceimprovement bound (the dedicated mode)

0

0.5

1

1.5

2

2.5

Art Iperf

Normalized MPI

SharedFairStatic QoSDynamic QoSDedicated

20

Outline



– Platform QoS




– Initial Results

• Summary

21

Summary

•Motivate QoS-aware platform by showing case studies of CMP cache resource management

•Showed that it is important to provide better determinism in theplatform that supports multi-tasking and virtualization

•Described QoS aware architecture and QoS policies

•Developed two software prototypes (QoS aware Linux and QoS aware Xen)

•Simulation results have shown that these techniques efficiently manage the platform resources towards better performance for high priority level applications

22

Future Works

• Experiment more with dynamic QoS mechanism

– Detailed Specification of QoS targets & constraints

– Other algorithms for dynamic schemes

• Study the impact of QoS on more diverse applications (servers, VMs, etc)

• Generalize QoS for all other CMP resources– Core, Memory, Interconnect, I/O, etc

23

Thank You!

24

Related Work

• ICS 2004– Iyer, “CQoS: A Framework for Enabling QoS in Shared Caches of CMP

Platforms”

• PACT 2004– Kim, Chandra, Solihin, “Fair Cache Sharing and Partitioning in a Chip

Multiprocessor Architecture”

• PACT 2006– Hsu, Reinhardt, Iyer, Makineni, Newell, “Capitalist, Communist and Utilitarian

Policies: Shared Cache as a CMP Resource”

– Rafique, Lim, Thottethodi, “Architectural support for OS-driven CMP cache management”

• MICRO 2006– Qureshi, Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-

Performance, Runtime Mechanism to Partition Shared Caches”

– Nesbit, Aggarwal, Laudon, Smith, “Fair Queuing CMP Memory Systems”

– Keshavan, et al, “Molecular Caches”

from chaos to qos: case studies in cmp resource...

Documents