from chaos to qos: case studies in cmp resource...
TRANSCRIPT
From Chaos to QoS: Case Studies in From Chaos to QoS: Case Studies in
CMP Resource ManagementCMP Resource Management
North Carolina State University
Stanford UniversityIntel Corporation
Hari Kannan, Fei Guo, Li Zhao, Ramesh Illikkal, Ravi Iyer, Don Newell, Yan Solihin, Christos Kozyrakis
2
Outline
• Motivation and Problem Statement
• QoS in Resource Management
– Platform QoS
– Case Study: Cache Resource
• Experiments and Evaluation
– Prototype Implementation
– Initial Results
• Summary
3
Motivation
• More cores are integrated on the die
– Multi-tasking becomes more common: multiple applications are running simultaneously
– Virtualized workloads become mainstream: multiple VMs are consolidated onto the same platform
• Problems in platform resource management
– Loss of efficiency
• Disparate Behavior or Disparate Resource Usage of simultaneously-running applications/VMs
– No fairness or determinism guarantees
• Cache space or memory bandwidth available is non-deterministic
– No prioritization
• Cannot map priority level to platform resource allocation
4
Problem Statement
•Not all applications are equal - Users have preferences– End-users (client) want to treat foreground preferentially– End-users (server) want to provide service differentiation
• Priority-based OS scheduling no longer sufficient– With more cores, OS will allow high and low priority
applications to run simultaneously– Low priority applications will steal platform resources from
high priority apps � loss in performance & user experience
• Platform has no support for application differentiation– No knowledge of user preferences– No support for preferential treatment
5
Example of Resource Sharing Impact
• Choose art as high priority and iperf as low priority
• High priority application suffers 10X more cache misses– Iperf has poor cache locality thus thrashes the cache
• Need for the platform to comprehend the priority of applicationsso that it can allocate hardware resources accordingly
0%
20%
40%
60%
80%
100%
120%
Art (alone) Art+Iperf
Normalized occupancy (%)
0
2
4
6
8
10
12
Normalized MPI
Art-occupancy Art-MPI
Investigate QoS policies and mechanisms to efficiently managethese shared resources in the presence of disparate applications.
6
Outline
• Motivation and Problem Statement
• QoS in Resource Management
– Platform QoS
– Case Study: Cache Resource
• Experiments and Evaluation
– Prototype Implementation
– Initial Results
• Summary
7
Resource Management
Capitalist
• No management of resources
• If you can generate more requests, you will use more resources
• Grab as you will
• E.g. All of today’s policies
• Fair distribution of resources
• Give equal share of resources to all executing threads
• Does not necessarily guarantee equal performance
• E.g. Partitioning resources for fairness and isolation
Communist/Fair
• Focus on individual efficiency
• Provide more performance and resources to the VIP
• Limited resources to non-VIP
• E.g. Service Level Agreements, Foreground/Background
Elitist
• Focus on overall efficiency
• Give more resource to those that need it the most, less to others
• E.g. Cache-friendly vs. Unfriendly, resource-aware scheduling
Utilitarian
8
Platform QoS
SoftwareDomain
HardwareDomain
HWPolicies
Resource Monitoring
ResourceEnforcement
QoSExposure
Feedback
QoS Hints viaArchitectural Interface
Memory IOCPU Core
Cache
QoS Enabled Resources
uArch resourceusage
Cache Space
Bandwidthand Latency
I/O Response Time
Case Study
9
Platform PrioritySent through PQR
QoS Aware OS/VMM:Platform Priority added
to App state
QoS Exposure:QoS Aware OS/VMMPlatform QoS Register
CacheCacheHi LoHi Lo
LowPriority
OSOS
High Priority
Set Application’sPlatform Priority
IOIO
MemoryMemory
QOS Interface
Application PlatformPriority
App State
App State
App State
App State
Resource Monitoring:
Monitor cache utilization per priority level
- Tag cache requests
- Count usage per priority
Resource Enforcement:Enforce cache utilization for priority levels
- Way Partitioning- Capacity Partitioning
Core 0 Core 1
Platform QOS Register
Requests tagged with Priority
QoS Aware Architecture
Expose QoSInterface
Cache
10
Cache QoS Polices & Metrics
• Static– N priority levels supported in platform
– Set a threshold of cache usage for each priority level
• Dynamic– Monitor cache space usage &
performance metric at frequent intervals
– Dynamically Adjust based on
• QoS Targets: The extent to which high priority application should be improved
• QoS Constraints: The allowable degradation to low priority application or the overall performance
– Metrics
• Resource performance (Miss Rate or MPI, etc)
• Overall performance (IPC, etc)
Performance
High Priority Low Priority Overall
Dedicated Mode
Shared Mode
QoSQoS TargetTarget
Shared Mode
Dedicated ModeDedicated Mode
QoSQoS ConstraintConstraint
High priorityHigh priority
QoSQoS TargetTarget
Low priorityLow priority
QoSQoS ConstraintConstraint
Shared Mode
L2High Priority
Data
Mid Priority Data
L1
CORE 0
L1
CORE 1
APP1(High) APP2(Mid)
L1
CORE 2
APP3(Low)
Low
11
Outline
• Motivation and Problem Statement
• QoS in Resource Management
– Platform QoS
– Case Study: Cache Resource
• Experiments and Evaluation
– Prototype Implementation
– Initial Results
• Summary
12
Prototype -- QoS aware OS/VMM
•Add QoS bits -- indicate the priority level of the application
•Set QoS register in the platform
– Special I/O instruction during each context switch
•Priority level management
Linux Kernel
Add QoS bits in process’s state data structure
…
Modify OS scheduler to set QoS register
Add sys_getQRand sys_setQRsystem calls as API for users
QoS utility
Add a tool to query/set QoS bits in user
level
App0 AppN…
13
Simulation Framework
• Software Prototype– QoS-enabled OS: Fedora Core 5 Linux– QoS-enabled VMM: Xen with Suse 9.1 Linux
• Full system simulation– Employ SoftSDV to functionally model the architecture– Employ Casper as a cache simulator
• Modified to support monitoring and cache space allocation using static/dynamic QoS policies
SoftSDV
Functional
CPU Model
Performance
Cache Model
QoS-Enabled Linux OS
( Fedora Core 5 )
APP1 APPn….
QoS-Enabled VMM
( Xen )
OS
App1 App2
VM1
OS
App1 App2
VM2
Disk image
APIs
S/WPrototype
H/WSimulation
14
Evaluation Setup
• Simulation configuration
• Applications– QoS aware OS simulation• Spec2000 benchmarks (gcc, ammp, art, applu, mcf, mesa)
– QoS aware VMM simulation• Spec2000 benchmarks (art, swim, mesa, bzip2)• Networking benchmark (Iperf)
1/2/4Cores
Unified, 256/512/2048/4096/8192 KB, 16 way, 64B line
L2 (Shared)
Unified, 32KB, 16 way, 64B line, LRUL1 (Private)
ValuesParameters
15
Static Policy on Two-core CMP
0
0.5
1
1.5
2
2.5
100% 40% 30% 20% 10%
Cache Space that Ammp can consume
Normalized MPI
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Average Cache Occupancy (%)
Gcc-space Ammp-spaceGcc-MPI Ammp-MPI
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
100% 40% 30% 20% 10%
Cache Space that Gcc can consume
Normalized MPI
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Average Cache Occupancy (%)
Ammp-space Gcc-spaceAmmp-MPI Gcc-MPI
– As we reduce the cache space available for ammp, MPI for gcc is reduced
– Gcc benefits more from cache QoS than ammp
• Probably because gcc is more sensitive to the cache size around this size range
– Low priority application exceeds its threshold
• Sharing between applications
• Capacity partitioning
– Try to find a victim of low priority if it exceeds its threshold
– Replace a high priority cache line if set is full of high priority cache lines
512K 512K
16
Static Policy on Four-core CMP
Set threshold for high, mid and low as 100%, 10%, 10%, 0%
0
0.5
1
1.5
2
2.5
3
3.5
4
Applu Art Gcc Mcf
Normalized MPI
Shared
Prioritized
High
%
Mid Mid Low
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21Time interval
Cache Occupancy (%)
applu art gcc mcf
AVE
0%
20%
40%
60%
80%
100%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Time interval
Cache Occupancy (%)
applu art gcc mcf
AVE
– MPI for applu is reduced by 33%, MPI for art, gcc and mcfincreases by 21%, 150% and 216% respectively
– Employing cache QoS can efficiently assign a deterministic amount of cache space to various applications.
1M
17
Static Policy on Two-VM CMP
•Iperf (low pri) in Xen’s Domain 0, art (high pri) in Domain 1
•4M cache– MPI for art is reduced significantly
•2M cache– MPI for art is reduced lineally
– This is at the cost of iperf performance
– The overall MPI increases significantly when iperf is limited to 10% of the 2M cache --> resource management of small caches could adversely affect the system’s performance if the lower priority application is not allocated a minimum amount of cache space
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
100% 40% 30% 20% 10%
Cache Space that Iperf can consume
Normalized MPII
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Average Cache Occupancy (%)
Art-occupancy Iperf-occupancyArt-MPI Iperf-MPI
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
100% 40% 30% 20% 10%
Cache Space that Ipe rf can consume
Normalized MPII
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Average Cache Occupancy (%)
Art-occupancy Iperf-occupancyArt-MPI Iperf-MPI
4M 2M
18
Static Policy on Four-VM CMP
– Set threshold for high, mid, low as 100%, 20%, 20%, 10%
– MPI for art is reduced by 45%
– MPI for mesa is reduced by 10% because of the decreased interference from iperf
– Bzip2 gets some degradation
– Iperf sees more than 2x degradation
0
0.5
1
1.5
2
2.5
Art Mesa Bzip2 Iperf
Normalized MPI Shared
Prioritized
High LowMidMid
4M
19
Comparison of Various Policies
– art as high priority and iperf as low priority
– Shared mode: without prioritization
– Fair mode: each application occupies half of the cache
– Static QoS mode: set threshold for iperf as 10%
– Dynamic QoS mode: QoS target, MPI = 0.5 of the shared mode
– Dedicated mode: application occupies the whole cache
Static and dynamic QoS are efficient to approach the performanceimprovement bound (the dedicated mode)
0
0.5
1
1.5
2
2.5
Art Iperf
Normalized MPI
SharedFairStatic QoSDynamic QoSDedicated
20
Outline
• Motivation and Problem Statement
• QoS in Resource Management
– Platform QoS
– Case Study: Cache Resource
• Experiments and Evaluation
– Prototype Implementation
– Initial Results
• Summary
21
Summary
•Motivate QoS-aware platform by showing case studies of CMP cache resource management
•Showed that it is important to provide better determinism in theplatform that supports multi-tasking and virtualization
•Described QoS aware architecture and QoS policies
•Developed two software prototypes (QoS aware Linux and QoS aware Xen)
•Simulation results have shown that these techniques efficiently manage the platform resources towards better performance for high priority level applications
22
Future Works
• Experiment more with dynamic QoS mechanism
– Detailed Specification of QoS targets & constraints
– Other algorithms for dynamic schemes
• Study the impact of QoS on more diverse applications (servers, VMs, etc)
• Generalize QoS for all other CMP resources– Core, Memory, Interconnect, I/O, etc
23
Thank You!
24
Related Work
• ICS 2004– Iyer, “CQoS: A Framework for Enabling QoS in Shared Caches of CMP
Platforms”
• PACT 2004– Kim, Chandra, Solihin, “Fair Cache Sharing and Partitioning in a Chip
Multiprocessor Architecture”
• PACT 2006– Hsu, Reinhardt, Iyer, Makineni, Newell, “Capitalist, Communist and Utilitarian
Policies: Shared Cache as a CMP Resource”
– Rafique, Lim, Thottethodi, “Architectural support for OS-driven CMP cache management”
• MICRO 2006– Qureshi, Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-
Performance, Runtime Mechanism to Partition Shared Caches”
– Nesbit, Aggarwal, Laudon, Smith, “Fair Queuing CMP Memory Systems”
– Keshavan, et al, “Molecular Caches”