supporting soft real time tasks in the xen hypervisor min lee joint work with a. s. krishnakumar, p....

Supporting soft real time tasks in the Xen hypervisor

Min Lee

Joint work with A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik

2© 2009 Avaya Inc. All rights reserved.

Overview

Real time and Xen

Default Credit Scheduler

Experiments

– Instrumentation

Laxity-based scheduler

– Laxity (A)

– Boost with event (B)

– Simple Load Balance (L)

– Cache-aware Real-time load balancing (LL)

Result

Conclusion


Real time and Xen

Near real-time applications pose challenges

– low-performance virtualization I/O

– scheduling latency

– shared-cache contention

Scheduler is central to all these issues

– scheduling latency as a first-class resource

– managing shared caches


Deployment of Generic Enterprise Telephony system.

Hardware

Xen Hypervisor

Do

m0

Cal

l-C

Med

ia-S

Sip

-S

BackboneNetwork Local Area Network

.

.

.

IP Endpoint

.

.

.

IP Endpoint

Call-C and Sip-S(Call setup/tear down)

Gateway/ Media Server

Gateway/ Media Server(stream encode/decode)

Local Area Network



VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

Simple Round Robin



Consumes credits to run


Overpriority

Underpriority

Every 30ms, distribute credits based on weights

– E.g. 20% CPU to VM 1, 40% CPU to VM 2

– Proportional distribution

– Default weight (256)



Boost up waking-up tasks


Overpriority

Underpriority

VCPU3

New incoming (waking-up) task

– Probably due to external event (packet arrival)

– E.g. Domain 0



Boost up waking-up tasks

Boostpriority


Overpriority

Underpriority

VCPU3

VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0 VCPU3

Only one time boost-up (<10ms)



Insufficient support for real-time tasks

RT-tasks (Media-S) competes with other tasks

– Treated as just CPU-bound task

– Missing its deadlines


Credit Crisis

Credit scheduler

– Good for CPU task

CPU allocation(credit)

CPU-bound tasksBackground tasks(supported by credit)

System tasksInteractive tasks(i.e. Domain 0, supported byboost priority)

Low latency Multimedia tasks

(Soft real time)

Boost priority

– Boost waking-up task

– Short time (<10ms)

– Good for Dom0!

Bad for Media task

– Need both CPU & Low latency!

– Competing with others


Credit Crisis

Now on, Laxity-based scheduler is introduced.

– Incremental (4 components)

Experiments

– Instrumentation

Laxity-based scheduler

– Laxity (A)

– Boost with event (B)

– Simple Load Balance (L)

– Cache-aware Real-time load balancing (LL)


Experiments (Deployment)

PESQ

– Quality of voice

– Major metric

Enterprise Telephony system– Media server – ‘Media-S’

– Signaling server – ‘Call-C,Sip-S’

– Other VMs• Dom0

• Cdom [Computational domain]

• Two more domains [Licensing, management]

Two flavors– Standard (Cdom is empty)

– Cpuload (Cdom has 4 cpu-bound tasks)


Experiments (Setup)

Dell 2950 server

– 2 quad-core Xeon processors

– 4 GB of RAM

2 cores from each socket

– So private cache

4cps, sample one out of 4calls

– G.711, 20ms packetization

– 30sec hold time

– Max 240 streams through Media-S

PESQ

– Compare the referencewith stream from Media-S

COMPACT

IP networkIP network

Server

SIPp and RTP Clients

Xen Hypervisor

Do

m0

Med

ia-S

Cal

l-C

SIP

-S

C-D

om …

tcpdump

COMPACTCOMPACT

IP networkIP network

Server

SIPp and RTP Clients

Xen Hypervisor

Do

m0

Med

ia-S

Cal

l-C

SIP

-S

C-D

om …

tcpdump


Instrumentation

Custom events for xentrace

CPU utilization

Worst/average wait time

– Each priority

Time slice length

– Each priority

Cache misses

– By xenoprof

Average wait time

L2 cache misses


Reading the plots

PESQ

– Quality of voice

– 0(bad)~4.5

– 4.0(toll quality)

Cumulative Density

Boxplots

– Min/Max

– 25%,75% percentile

– Median

Poor quality Good quality


Performance with default credit scheduler

More weights (512, 1024)

– didn’t improve much

Only configuration it worked – pinning

Media-S/Call-C requires significant CPU

– Pinned Media-S/Call-C to CPU0,1 respectively

– Pinned others to CPU2,3

– Dom0 is floating

Underutilizing CPU!


Laxity (A)

Kind of priority?

– Laxity!

– Target scheduling latency or deadline

– E.g. less than 5ms wait time in the run queue

– Parameter specified by user

Real-time task

– Has laxity value

Non-real-time task

– Doesn’t have laxity value (or conceptually infinite laxity)


Implementation of laxity

Where to insert real-time tasks to meet their deadline

– In over priority, laxity value is ignored.

5us20us20us&boost

VCPU4(1us)

VCPU2(7us)

VCPU7(4us)

VCPU1(13us)

VCPU0(40us)

VCPU5(21us)CPU0

VCPU#(Length of

previous time slice)Boostpriority

Overpriority

Underpriority


Prediction of wait time

Each VCPU maintains an expected run time

– Previous time slice length

The amount of CPU time it utilized in its previous run

Works reasonably well

– Min/max

– 25%,75% percentile

More sophiscated formula can be used

Average difference betweenexpected and actual wait times


Laxity Result

Improved PESQ


Boost with event (B)

Want to boost-up tasks receiving external event.

– Packet arrival.

But credit scheduler only boost up waking-up task!

– Media-S in run queue doesn’t get boosted

So, we boost it up in the run queue

VCPU4(1us)

VCPU2(7us)

VCPU7(4us)

VCPU1(13us)

VCPU0(40us)

VCPU5(21us)CPU0

(1) Receiving event

VCPU4(1us)

VCPU1(13us)

VCPU0(40us)

VCPU5(21us)CPU0

(2) Get boosted within queue

VCPU2(7us)

VCPU7(4us)


Simple Load Balance (L)


VCPU10 VCPU15CPU1

(1) Over or idle task?

(2) Peek peer’s Q

(3) Steal higher priority task

Default Credit scheduler




VCPU10 VCPU15CPU1

(1) Under, over or idle task?

(2) Peek peer’s Q

(4) If same priority(a) If mine is real time task, don’t steal(b) If peer’s is real time task, steal(c) Both non-real-time, compare entrance time into queue

- with some delay (2ms)

Laxity-based scheduler (Simple Load Balance)

VCPU17 VCPU11

(3) Steal higher priority task



Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1


VCPU17 VCPU11

(1) This effectively distributes real-time tasks over CPUs(2) Also prevents starvation



Media-S

VCPU2

VCPU7 VCPU1 VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1


VCPU17 VCPU11

(1) Kicks out non-real-time task if they waited for some time



Media-S

VCPU2 VCPU7

VCPU1 VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1


VCPU17 VCPU11



Media-S

VCPU2 VCPU7 VCPU1

VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1


VCPU17 VCPU11


Cache-aware Real-time load balancing (LL)

Good, but…

– Ping-ponging tasks trashes cache

Bind RT-tasks to CPU

– Fix RT-tasks to its initial CPU

– Disable load-balancing for RT-tasks

Unbalance of multiple RT-tasks

– Need new load-balancer for RT-tasks



Don’t steal peer’s real-time tasks in L

– Prefer hot cache (to low wait time)

New x-sec-load balancer

– Balance of real time task’s cpu utilization via bin-packing

CPU0 90% utilized by one real time task

CPU1 80% utilized by three real time tasks



Improved PESQ

RT

I’m Happy!

I love you!


Result (CPU utilization)

We’re fully utilizing CPUs!

0

2000

4000

6000

8000

10000

12000

14000

baseline(pinned) baseline A AL ALLPolicies

Am

ou

nt

of

wo

rk d

on

e b

y cp

uh

og

gin

g

pro

cess


Result (Various laxity value)

Lower laxity more realtime better quality

various laxity values - standard configuration


Result (Various laxity value)

Lower laxity more realtime better quality

various laxity values - cpuload configuration


Result (Boost with event)

Some impact

Adding Boost with event


Result (Load balance)

Load balancing is essential for multicore

PESQ improvement through load balancer in standard configuration


Result (Cache misses)

We should consider cache

Cache misses for standard configuration

RT

You know,

I love you

0

500

1000

1500

2000

2500

3000

3500

4000

base

line A AL

ALL

Total misses

Mis

ses

(x10

000)

0

200

400

600

800

1000

1200

base

line A AL

ALL

base

line A AL

ALL

base

line A AL

ALL

base

line A AL

ALL

base

line A AL

ALL

C-dom Domain-0 Call-C Sip-S Media-S


Conclusion

New soft real time aware scheduler for Xen

– Better support for real time applications without penalizing non-real time tasks

– Fully utilizing CPU resources

– Timeliness requirement expressed by laxity

Instrumentation to help design and measure performance

– Can also be used for other purposes

Thank you!

Extra slides


Data Centers

Problem:

– Server sprawl

– Underutilization

– Management

Solution: Consolidation

– Reduce resource wastage

– Reduced floor space

– Better power management

How?

(Adapted from ‘Xen and co.’ slides)


Server virtualization

Ability to create multiple virtual servers from a single physical server

– Allows consolidation by hosting heterogeneous OS instances over the same hardware

Linux

Hardware Hardware

Windows

2-tiered

e-commerce

application

Single tier

streaming

server

Operating

system

Applications

VMM


Why now?

– Emergence of highly efficient virtual machine monitors

• Xen, VMware etc

– Hardware support• Intel, AMD, IBM etc

– Real world example:• Amazon EC2


Consolidation

Hardware Hardware

VMMVMM

Jboss mysql

Clients

.

.

.

Resource

underutilizedCPUintensive

VMs

10% 20%

Almost 100%

Server

Utilization



Real Time Tasks

Hardware Hardware

VMMVMM

mysql

Clients

.

.

.

CPUintensive

VMs

Almost 100%

Server

Utilization


RT

NO!OMG!

Give my CPU back!

Just give him what he wants.

– Real time task want his CPU back!


Xen Virtual Machine Monitor

Xen Hypervisor

Domain 0/Driver domain

ModifiedGuest OS

ModifiedGuest OS

ModifiedGuest OS

…Virtual

machines

I/O virtualization

VM scheduler

Virtual hardware (vCpu, vDisk, vNic, vMemory etc.)

Physical hardware (Cpu, Disk, Nic, Memory etc.)

ApplicationsApplications Applications



0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Time in 100sec

Tim

es

Media-S.0

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Time in 100sec

use

c

Media-S.0

(a) Count under queue (b) Scheduled time from underBase-Cpuload: Detailed queue metrics

Detailed queue metrics


Detailed queue metrics

0

100

200

300

400

500

600

700

800

900

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Time in 100sec

use

c

Media-S.0

0

50000

100000

150000

200000

250000

300000

350000

400000

450000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Time in 100sec

use

c

Media-S.0

(c) Count under queue (d) Scheduled time from underA-Cpuload: Detailed queue metrics


0

50

100

150

200

250

300

350

400

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Time in 100sec

use

c

Domain-0.0Domain-0.1Domain-0.2Domain-0.3aes.0cm.0cobar.0ses.0udom.0udom.1udom.2udom.3utility_server.0

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17Time in 100sec

use

c

Domain-0.0Domain-0.1Domain-0.2Domain-0.3aes.0cm.0cobar.0ses.0udom.0udom.1udom.2udom.3utility_server.0

Worst wait time in under queue (AL-standard)

Average wait time in under queue (AL-standard)


0

500

1000

1500

2000

2500

3000

3500

4000

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154Time in 10sec

Mis

se

s in

10

00

0

baselineAALALL

Total misses (standard)

0

200

400

600

800

1000

1200

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157Time in 10sec

Mis

se

s i

n 1

00

00

baselineAALALL

Cobar’s cache misses (standard)

A = 77%AL = 72%ALL = 37%(of baseline respectively)

A = 86%AL = 89%ALL = 73%(of baseline respectively)


Result (Cache misses)

We should consider cache

Cache misses for cpuload configuration

0

500

1000

1500

2000

2500

3000

3500ba

selin

e A AL

ALL

Total

Mis

ses

in 1

0000

0

200

400

600

800

1000

1200

base

line A AL

ALL

base

line A AL

ALL

base

line A AL

ALL

base

line A AL

ALL

base

line A AL

ALL

C-dom Domain-0 Call-C Sip-S Media-S

supporting soft real time tasks in the xen hypervisor min lee joint work with a. s. krishnakumar, p....

Documents

time boost

default credit schedulerboost

real time tasksjust

overviewreal time

physical hardware cpu

soft real time tasks

extra slides

multiple virtual servers