design and implementation of a generic resource-sharing virtual-time dispatcher tal ben-nun scl. eng...

Design and Implementation ofDesign and Implementation ofa Generic Resource-Sharinga Generic Resource-Sharing

Virtual-Time DispatcherVirtual-Time Dispatcher

Tal Ben-NunScl. Eng & CS

Hebrew University

Yoav EtsionCS Dept

Barcelona SC Ctr

Dror FeitelsonScl. Eng & CS

Hebrew University

Supported by the Israel Science Foundation, grant no. 28/09

Design and Implementation ofa Generic Resource-Sharing

Virtual-Time Dispatcher

Goal is to control share of resources, not to optimize performance – important in virtualization

Same module used for diverse resources

Mechanism used: dispatch the most deserving client at each instant

Selection of deserving client using virtual time formalism

Implemented and measured in Linux

Motivation

Context: VMM for server consolidation Multiple legacy servers share physical platform Improved utilization and easier maintenance Flexibility in allocating resources to virtual machines Virtual machines typically run a single application

(“appliances”)

Motivation

Assumed goal: enforce predefined allocation of resources to different virtual machines(“fair share” scheduling) Based on importance / SLA Can change with time or due to external events

Problem: what is “30% of the resources” when there are many different resources, and diverse requirements?

Global Scheduling

“Fair share” usually applied to a single resource But what if this resource is not a bottleneck?

Global scheduling idea:

1) Identify the system bottleneck resource

2)Apply fair share scheduling on this resource

3)This induces appropriate allocations on other resources

This paper: how to apply fair-share scheduling on any resource in the system

Previous Work I: Virtual Time

Accounting is inversely proportional to allocation Schedule the client that is farthest behind

Previous Work II: Traffic Shaping

• Leaky bucket

– Variable requests

– Constant rate transmission

– Bucket represent buffer

• Token bucket

– Variable requests

– Constant allocations

– Bucket represents stored capacity

Putting them Together: RSVT

• “Resource sharing”: all clients make progress continuously– Generalization of processor sharing

• Each job has its ideal resource sharing progress– This is considered to be the allocation ai

– Grows at constant rate

• Each job has its actual consumption ci

– Grows only when job runs

• Scheduling priority is the difference:

pi = ai – ci

ExampleThree clients

Allocations roughly 50%, 30%, 20%

Consumption always occur in resource time

Wallclock time

Bookkeeping

• The set of active jobs is A

• The relative allocation of job i is ri

• During an interval T job k has run

• Update allocations:

• Update consumptions:

otherwise

kiTci 0

The Active Set

• Active jobs (the set A) are those that can use the resource now

• Allocations are relative to the active set

• The active set may change

– New job arrives

– Job terminates

– Job stops using resource temporarily

– Job resumes use of resource

Grace Period

• Intermittent activity: process data / send packet

• should retain allocations even when inactive

• Thus ai continues to grow during grace period after it becomes inactive

• Grace period reflects notion of continuity

• Sub-second time scale

Rebirth

• Resumption after very long inactive periods should be treated as new arrivals

• Due to grace period, job that becomes inactive accrues extra allocation

• Forget this extra allocation after rebirth period

(set ai = ci)

• Two order of magnitude larger than grace period

Implementation

• Kernel module with generic functionality– Create / destroy module– Create / destroy client– Make request / set active / set inactive– Make allocations– Dispatch– Check-in (note resource usage)

• Glue code for specific subsystems– Currently networking and CPU– Plan to add disk I/O

Networking Glue Code

Use the Linux QoS framework: create RSVT queueing discipline

queueingdiscipline

Networking Glue Code

Non-RSVT traffic has priority (e.g. NFS traffic) and is counted as dead time

sendimmediately

no enqueue

selectand send

CPU Scheduling Glue Code

• Use Linux modular scheduling core

• Add an RSVT scheduling policy

– RSVT module essentially replaces the policy runqueue

– Initial implementation only for uniprocessors

• CFS and possibly other policies also exist and have higher priority

– When they run, this is considered dead time

Timer Interrupts

• Linux employs timer interrupts (250 Hz)

• Allocations are done at these times

– Translate time into microseconds

– Subtract known dead time (unavailable to us)

– Divide among active clients according to relative allocations

– Bound divergence of allocation from consumption

• Also handling of grace period (mark as inactive)

• Also handling of rebirth (set ai = ci)

Multi-Queue

• At dispatch, need to find client with highest priority

• But priorities change at different rates• Solution: allow only a limited discrete set of

relative priorities• Each priority has a separate queue• Maintain all clients in each queue in priority

order• Only need to check the first in each queue to

find the maximum

Experiment – Basic Allocations

rate bandwidth

1 30.890.05

2 61.410.02

Experiment – Basic Allocations

rate bandwidth

1 15.690.11

2 30.810.03

3 46.100.03

Experiment – Active Set

Experiment – Grace Period

Experiment – Rebirth

Experiment – Throttling

•Two competing MPlayers

•The one with higher allocation does not need all of it

– Allocation tracks consumption

Conclusions

• Demonstrated generic virtual-time based resource sharing dispatcher

• Need to complete implementation

– Support for I/O scheduling

– More details, e.g. SMP support

• Building block of global scheduling vision

design and implementation of a generic resource-sharing virtual-time dispatcher tal ben-nun scl. eng...

Documents

the uberflip experience 2016: yoav schwartz

attenuating natural flicker patterns yoav y. schechner nir...

yoav benjamini department of statistics and operations

computer architecture 2015 – advanced branch prediction 1...

1/19 secretly monopolizing the cpu without superuser...

yoav lerman thesis

computer architecture: a constructive approach ehrs:...

hallel i. schreier , yoav soen , and naama brenner1,4,*

book a table yoav 2014 - pdf

yoav benjamini, "in the world beyond p

graduation project by: parush anat & grisha klots klotsg...

magen yoav

yoav livneh yoav.livneh@mail.huji.ac.il02-6585163

the art of app engagement - yoav barel, liveperson

1948 – war of independence: operation yoav sunday european

triangulation in random refractive...

facebook for non profit by yoav segal פייסבוק...

open source developers – go figure! by yoav landman

computer architecture 2015 – caches 1 computer...

cs5740: natural language processing spring 2017 ·...