shalini xs10

33
Supporting Soft Real-Time Tasks Min Lee, A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik Paper published in VEE 2010

Upload: the-linux-foundation

Post on 20-May-2015

3.632 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Shalini xs10

Supporting Soft Real-Time Tasks

Min Lee, A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik

Paper published in VEE 2010

Page 2: Shalini xs10

© 2009 Avaya Inc. All rights reserved.

Problem Statement

�Given a mix of workloads how do you schedule the workload such that

– Real-time workloads get the needed resources

– Non-real-time workloads are not starved

�Main goal:

– Present a new scheduler based on the credit scheduler which meets the demands of a real-time workload

– Target application studied: Media in an IP telephony system

2

Page 3: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 3

Real-Time Applications

�Requirements of a Real-time application like Media Server

– Highly I/O bound

– Needs timely allocation of compute resources

�Challenges when deployed on Xen

– I/O virtualization overhead

– Variable scheduling latency with mixed workloads

– Cache contention between workloads

�Scheduler is central to all these issues!

Page 4: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 4

Target Application: Enterprise IP Telephony System

Hardware

Xen Hypervisor

Do

m0

Call-C

Med

ia-S

Sip

-S

Backbone

Network Local Area Network

.

.

.

IP Endpoint

.

.

.

IP Endpoint

Call-C and Sip-S

(Call setup/tear down)

Gateway/

Media Server

Gateway/ Media Server

(stream

encode/decode)

Local Area Network

Page 5: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 5

Credit Scheduler: Credit Handling

�Consumes credits to run

VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

Overpriority

Underpriority

�Every 30ms, distribute credits based on weights

– E.g. 20% CPU to VM 1, 40% CPU to VM 2

– Proportional distribution

– Default weight (256)

Page 6: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 6

Credit Scheduler: I/O Handling

�VCPUs woken up and boosted in priority when event arrives

Boostpriority

VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

Overpriority

Underpriority

VCPU3

VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0 VCPU3

�Only one time boost (<10ms)

Page 7: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 7

Credit Scheduler: Credit Crisis

�Credits

– Good for CPU-bound task

�Boost Priority

– Good for Low-latency tasks, e.g. I/O in Dom0

– Short period boost

�Media Servers

– Need both CPU and low latency

– Timely CPU

– No support in current scheduler

CPU- bound

Compute-intensive Tasks (supported by

credit)

Compute-intensive Tasks (supported by

credit)

Low-latency Tasks (I/O

processing, Interactive)

Low-latency Tasks (I/O

processing, Interactive)

Multi-media Tasks

(Soft Real-time)

Multi-media Tasks

(Soft Real-time)

Page 8: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 8

Laxity-based Scheduler

� Four Components in the scheduler

– Laxity (A)

– Boost with event (B)

– Simple Load Balance (L)

– Cache-aware Real-time load balancing (LL)

Page 9: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 9

Experimental Setup

�Enterprise Telephony system

– Media server – ‘Media-S’

– Signaling server – ‘Call-C,Sip-S’

– Other VMs

• Dom0

• Cdom [Computational domain]

• Two more domains [Licensing, management]

�Two workload scenarios

– Standard (Cdom with no load)

– Cpuload (Cdom has 4 cpu-bound tasks)

Page 10: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 10

Experimental Setup

� Dell 2950 server

– 2 quad-core Xeon processors

– 4 GB of RAM

� 4 cores used: 2 cores from each socket

– So private cache

� 4cps, sample one out of 4calls

– G.711, 20ms packetization

– 30sec hold time

– Max 240 streams through Media-S

� PESQ

– Quality of voice metric

– Compare the referencewith stream from Media-S

COMPACT

IP networkIP network

Server

SIPp and RTP Clients

Xen Hypervisor

Do

m0

Me

dia

-S

Call-C

SIP

-S

C-D

om …

tcpdump

COMPACTCOMPACT

IP networkIP network

Server

SIPp and RTP Clients

Xen Hypervisor

Do

m0

Me

dia

-S

Call-C

SIP

-S

C-D

om …

tcpdump

Page 11: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 11

Instrumentation

� Custom events for Xentrace

� CPU utilization

� Worst/average wait time

– Each priority

� Scheduled Time

– Each priority

� Cache misses

– By xenoprof

Average wait time

L2 cache misses

Page 12: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 12

Reading the plots

�PESQ

– Quality of voice

– 0 (bad)~4.5

– 4.0 (toll quality)

�Cumulative Density

�Boxplots

– Min/Max

– 25%,75% percentile

– Median

Poor quality Good quality

Page 13: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 13

Default Credit Scheduler: Performance

�Add weights – 512, 1024

– Insignificant impact

�Pinning

– Media-S/Call-C require significant CPU

– Pinned Media-S/Call-C to CPU 0,1 respectively

– Pinned others to CPU 2,3

– Dom0 is floating

– Significant performance gain

– Underutilizing CPU!

Weights Pinning

Page 14: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 14

Laxity (A)

�A form of priority

– Target scheduling latency or deadline

– E.g. less than 5ms wait time in the run queue

– Parameter specified by user

�Laxity values for Real-time domains

�No value specified for Non real-time domains

– Conceptually infinite laxity

Page 15: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 15

Implementation of laxity

�Where to insert real-time tasks to meet their deadline

– In over priority, laxity value is ignored.

5us

20us20us&boost

VCPU4(1us)

VCPU2(7us)

VCPU7(4us)

VCPU1(13us)

VCPU0(40us)

VCPU5(21us)CPU0

VCPU#(Length of

previous time slice)Boostpriority

Overpriority

Underpriority

Page 16: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 16

Prediction of wait time in runqueue

� Each VCPU maintains an expected run time

– The amount of CPU time it utilized in its previous run

� Works reasonably well

– Min/max

– 25%,75% percentile

� More sophisticated formula can be used

Average difference betweenexpected and actual wait times

Page 17: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 17

Laxity-based Scheduler: Performance

�Improved PESQ

Page 18: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 18

Boost with event (B)

�Credit scheduler boosts only waiting VCPUs

– Media-S in run queue doesn’t get boosted

�Idea: Boost a VCPU that receives an event

– Even if VCPU with under priority in runqueue

VCPU4

(1us)

VCPU2

(7us)

VCPU7

(4us)

VCPU1

(13us)

VCPU0

(40us)

VCPU5

(21us)CPU0

(1) Receiving event

VCPU4

(1us)

VCPU1

(13us)

VCPU0

(40us)

VCPU5

(21us)CPU0

(2) Get boosted within queue

VCPU2

(7us)

VCPU7

(4us)

Page 19: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 19

Credit Scheduler: Load Balance

VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1

(1) Over or idle task?

(2) Peek peer’s Q

(3) Steal higher priority task

Page 20: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 20

Simple Load Balance (L)

VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1

(1) Under, over or idle task?

(2) Peek peer’s Q

(4) If same priority(a) If mine is real time task, don’t steal(b) If peer’s is real time task, steal(c) Both non-real-time, compare entrance time into queue

- with some delay (2ms)

Laxity-based scheduler (Simple Load Balance)

VCPU17 VCPU11

(3) Steal higher priority task

Page 21: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 21

Simple Load Balance (L)

Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

VCPU10 VCPU15CPU1

Laxity-based scheduler (Simple Load Balance)

VCPU17 VCPU11

(1) This effectively distributes real-time tasks over CPUs(2) Also prevents starvation

Page 22: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 22

Simple Load Balance (L)

VCPU2 VCPU10 VCPU15CPU1

Laxity-based scheduler (Simple Load Balance)

VCPU17 VCPU11

(1) Moves non-real-time task if they waited for some time

Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0

Page 23: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 23

Cache-aware Real-time load balancing (LL)

�L good, but…

– Ping-ponging tasks trash cache

�Solution: Bind RT-tasks to CPU

– Fix RT-tasks to its initial CPU

– Disable load-balancing for RT-tasks

�Creates unbalance of multiple RT-tasks

– Need a new load-balancer for RT-tasks

Page 24: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 24

Cache-aware Real-time load balancing (LL)

�Don’t steal peer’s real-time tasks in L

– Prefer hot cache (to low wait time)

�New x-sec-load balancer

– Balance of real time task’s cpu utilization via bin-packing

CPU0 90% utilized by one real time task

CPU1 80% utilized by three real time tasks

Page 25: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 25

Cache-aware Real-time load balancing (LL)

�Improved PESQ

Page 26: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 26

Result (CPU utilization)

�We’re fully utilizing CPUs!

0

2000

4000

6000

8000

10000

12000

14000

baseline(pinned) baseline A AL ALL

Policies

Am

ou

nt

of

wo

rk d

on

e b

y c

pu

ho

gg

ing

pro

cess

Page 27: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 27

Result (Cache misses)

Total cache misses down

Cache misses for standard configuration

0

500

1000

1500

2000

2500

3000

3500

4000

baselin

e A

AL

ALL

Total misses

Mis

ses (

x10000)

0

200

400

600

800

1000

1200

baselin

e A

AL

ALL

baselin

e A

AL

ALL

baselin

e A

AL

ALL

baselin

e A

AL

ALL

baselin

e A

AL

ALL

C-dom Domain-0 Call-C Sip-S Media-S

Cache misses for Media Server down

Page 28: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 28

Conclusion

�New soft real-time aware scheduler for Xen

– Better support for real-time applications without penalizing non-real time tasks

– Fully utilizing CPU resources

– Timeliness requirement expressed by laxity

Page 29: Shalini xs10

© 2009 Avaya Inc. All rights reserved.

Backup

Page 30: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 30

Result (Various laxity value)

�Lower laxity � more realtime � better quality

various laxity values - standard configuration

Page 31: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 31

Result (Various laxity value)

�Lower laxity � more realtime � better quality

various laxity values - cpuload configuration

Page 32: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 32

Result (Load balance)

�Load balancing is essential for multicore

PESQ improvement through load balancer in standard configuration

Page 33: Shalini xs10

© 2009 Avaya Inc. All rights reserved. 33

Result (Boost with event)

�Some impact

Adding Boost with event