sabyasachi ghosh mark redekopp murali annavaram ming-hsieh department of ee usc knightshift:...

15
Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC http://usc.edu/dept/ee/scip KnightShift: Enhancing Energy Efficiency by Shifting the I/O Burden to a Management Processor

Upload: georgia-wilby

Post on 29-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

Sabyasachi GhoshMark RedekoppMurali AnnavaramMing-Hsieh Department of EEUSChttp://usc.edu/dept/ee/scip

KnightShift: Enhancing Energy Efficiency byShifting the I/O Burden to a Management

Processor

Page 2: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 3

•Datacenter energy concerns•Direct-attached storage issues

• KnightShift solution• IPMI • Modifications to IPMI

• Trace description• Results •On-going work and conclusions

Outline

Page 3: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 2

• Datacenter energy costs are a key concern• Common-case utilizations are very low,

but not zero• Servers are not energy efficient at low

utilizations• Consolidation and power-down are

effective solutions• Long wakeup latencies from shutdown/low

power modes are being mitigated

• Except, Direct-attached storage (DAS) datacenters can not benefit from consolidation

Datacenter Energy Concerns

Page 4: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 4

Direct-Attached Storage Architecture

• Data is distributed on disks attached to individual nodes

• Client requests arrive at a load balancer (1)• Load balancer assigns the request to one

node (2) • Satisfying a request requires data from

multiple nodes (3a) • Each remote node gets the data request

• Remote nodes access their local disks (3b)

• Generate response to the requestor• Requestor performs necessary computation

on the consolidated data • Sends a response to the client (4)

Page 5: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 5

Server Power under DAS

• Servers show lack of energy proportionality at low utilization• Power at 10% utilization is (much) more than

10% of the power at peak utilization

• Energy proportionality is not just a CPU problem• Memory, disks, fans are one major source of

power consumption• Motherboard components (voltage regulators,

PCI slots) also consume power • CPUs are in fact becoming more energy

proportional • Power scales to a limit using DVFS, clock gating,..

• Achieving energy proportional server requires putting all motherboard components to sleep

Page 6: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 6

KnightShift as a Solution

• KnightShift: Handle remote I/O requests using low power subsystem• Main server sleeps during low utilization

while maintaining availability of data on the disks

• Low power subsystem is called the Knight

• Knight has the following properties• Closely attached to the main server to

access its disk data• Electrically isolated from main server • Capable of receiving, interpreting,

servicing remote request• Transparent to outside world

Page 7: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 7

Intelligent Platform Management Interface

• Intelligent Platform Management Interface (IPMI) is a widely-implemented standard for out-of-band server management

• Admins can remotely monitor server health with sensors, power on/off the server, install software

• At the core of IPMI is Baseboard Management Controller (BMC)

• BMC uses the same network interface as the primary system and even the same IP address

• Embedded CPU, flash memory, separate power rails

Page 8: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 8

IPMI as a Knight

• IPMI satisfies most properties of a Knight• Electrically isolated• transparently handles network packets• However, it does not have access to the

primary server disks• Modify IPMI

• Modify IO Hub with 2-input mux which switches between primary and Knight as needed

• BMC must be able to handle disk access requests and be able to understand a few filesystems

• BMC is already highly capable and can do complex network packet filtering

• Knight capabilities further enhanced when BMC supports the same ISA

Page 9: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 9

Using Knight for System-level Power Saving

• Primary server memory turned off• BMC’s flash memory to use as I/O buffers• Dirty disk data cached in primary memory

drained to disk

• Knight can handle even non-I/O requests • Requests with limited compute demands• Support the same ISA

• IBM ASMA supports full ISA

• Knight best for handling stateless workloads

• Many e-commerce transactions are stateless

Significantly increases primary server sleep time by turning off the entire server (except disks), not just any single component

Page 10: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 10

Trace Based Evaluation

• Minute-granularity utilization traces from USC's production datacenter

• Compute, mail and NFS file server cluster• In particular, clusters use DAS• Detailed SAR traces collected for 9 days

• Servers underutilized as can be seen from the graph• 10% CPU utilized for nearly 90% of the time

Page 11: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 11

CPU Utilization vs. System Utilization

• CPU utilization is closely tied to overall system utilization (shown also in prior work (Fan2007)

• Figure shows CPU utilization on Y-axis and disk utilization on secondary Y-axis for SCF

Page 12: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 12

Ideal Case Power Savings

• Derived power versus utilization for current servers from SpecWEB power benchmarks

• Assume power consumption in ideal servers scales quadratically with performance

• Ideal machine power at 1/10 utilization is 1/100 of the peak power

• Huge gap between current and ideal system power consumption

Page 13: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 13

KnightShift Power Savings

• When trace shows CPU utilization < 10% assume Knight is ON

• Knight power is constant at 1/100 of primary server power

• When trace shows CPU utilization > 10% assume primary is ON

• Primary server power is proportional to utilization (based on current server data from SpecWEB)

• At wakeup primary consume 100% power

Primary Server ON

Knight ON

Page 14: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 14

Power Savings vs Performance Degradation

• Response time grows when operating with Knight• Assuming a range of Knight

capabilities the response time increases to 11% of the original time

• Energy savings increase as Knight becomes more capable, giving more opportunities for the primary server to sleep

Page 15: Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC  KnightShift: Enhancing Energy Efficiency by

| 15

Conclusion• Datacenter energy consumption is a serious concern

• Consolidating and powering down idle servers is an effective approach

− Does not work for direct-attached storage datacenters

• KnightShift uses IPMI based BMC as a low power subsystem to handle remote I/O

− Knight exploits IPMI’s unique characteristics to handle remote I/O requests

• Trace based evaluation to study the current headroom− Traces collected for 9 days from USC datacenter for several clusters

− Headroom studies show 2.5X improvement in energy consumption with Knight

• Going forward plan to use a mix of analytical (queuing) models and emulation based implementation of KnightShift