xs oracle 2009 transcendent memory

53
<Insert Picture Here> Transcendent Memory on Xen Speaker: Dan Magenheimer Oracle Corporation 2009

Upload: the-linux-foundation

Post on 20-May-2015

1.042 views

Category:

Technology


2 download

DESCRIPTION

Dan Magenheimer: Transcendent Memory on Xen

TRANSCRIPT

Page 1: XS Oracle 2009 Transcendent Memory

<Insert Picture Here>

Transcendent

Memory on XenSpeaker: Dan Magenheimer

Oracle Corporation

2009

Page 2: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Agenda

• Motivation and Challenge

• Overview of Physical Memory Management

• Transcendent Memory (“tmem”) Overview

• Transcendent Memory in Action

• Status, Futures, etc.

Page 3: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Motivation

•Memory is increasingly becoming a

bottleneck in virtualized system

• Existing mechanisms have major holes

One 4-CPU physical server w/4GB RAM

Four underutilized 2-cpu virtual servers

each with 1GB RAM

X

��������

����X

page sharing

ballooning

memory overcommitment

Page 4: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

The Virtualized Physical Memory

Resource Optimization Challenge

Optimize, across time, the distribution of machine

memory among a maximal set of virtual machines by:

• measuring the current and future memory need of

each running VM and

• reclaiming memory from those VMs that have an

excess of memory and either:

• providing it to VMs that need more memory or

• using it to provision additional new VMs.

• without suffering a significant performance penalty

Page 5: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

The Virtualized Physical Memory

Resource Optimization Challenge

Optimize, across time, the distribution of machine memory among a maximal set of virtual machines by:

• measuring the current and future memory need of each running VM and

• reclaiming memory from those VMs that have an excess of memory and either:• providing it to VMs that need more memory or

• using it to provision additional new VMs.

• without suffering a significant performance penalty

…..Why is this a hard problem?

Page 6: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Agenda

• Motivation and Challenge

• Overview of Physical Memory Management• in an operating system

• in a virtual machine monitor (Xen)

• Transcendent Memory Overview

• Transcendent Memory In Action

• Status, Futures, etc.

Page 7: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

OS Physical Memory Management

• Operating systems

are memory hogs!OS

Memory constraint

Page 8: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Operating systems are

memory hogs!

If you give an

operating system

more memory…..

New larger memory constraint

OS

OS Physical Memory Management

Page 9: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Operating systems are

memory hogs!

…it uses up any

memory you give it!

My name is Linux and I

am a memory hog

Memory constraint

OS Physical Memory Management

Page 10: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• What does an OS do

with all that memory?

Kernel code

Kerneldata User code

User dataPage cache

Page tables

Everythingelse

OS Physical Memory Management

Page 11: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• What does an OS do

with all that memory?pagecache

OS Physical Memory Management

Page 12: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• What does an OS do

with all that memory?pagecache

OS Physical Memory Management

Page 13: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• What does an OS do

with all that memory?

…much of the time

mostly page cache

… some of which will

be useful in the future

… and some of which

is wasted

page cache

Everything else

OS Physical Memory Management

Page 14: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Agenda

• Motivation and Challenge

• Overview of Physical Memory Management

• in an operating system

• in a virtual machine monitor (Xen)

• Transcendent Memory Overview

• Transcendent Memory In Action

• Status, Futures, etc.

Page 15: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

VMM Physical Memory Management

• Xen partitions memory

• hypervisor memory

• dom0 memory

• guest memory

Dom0 is special ☺☺☺☺

guest

Page 16: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Xen partitions memory• Xen memory

• dom0 memory

• guest 1 memory

• guest 2 memory

• whatever’s left over: “fallow” memory

guest

guest

fallow, adj., land left without a crop for one or more years

VMM Physical Memory Management

Page 17: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Xen partitions memory• Xen memory

• dom0 memory

• guest 1 memory

• guest 2 memory

• whatever’s left over: “fallow” memory

guest

guest

fallow

fallow

fallow

fallow, adj., land left without a crop for one or more years

VMM Physical Memory Management

Page 18: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Xen partitions memory among more guests• Xen memory

• dom0 memory

• guest 1 memory

• guest 2 memory

• guest 3…

• BUT still fallow memoryleftover

guest

guest

guest

gues

t

fallow

fallow

fallow

VMM Physical Memory Management

Page 19: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• migration

• requires fallow memory

in the target machine

• leaves behind fallow

memory in the

originating machine

guest

guest

gues

t

fallow

fallow

fallow

guest

guest

gues

t

fallow

fallow

fallow

VMM Physical Memory Management

in the presence of migration

Physical machine “A”

Physical machine “B”

Page 20: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Use ballooning to

allow guest memory

size to grow?

• Goal: fill fallow memory

guest

guest

guest

gues

t

fallow

fallow

fallow

VMM Physical Memory Management

in the presence of ballooning

Page 21: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Look! No more

fallow memory!

But….guest

guest

guest

guest

fallow

guest

VMM Physical Memory Management

in the presence of ballooning

Page 22: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Look! No more fallow memory!

But….

And but…

guest

guest

guest

guest

fallow

guest

VMM Physical Memory Management

in the presence of ballooning

Page 23: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Using ballooning to take memory away:

• not instantaneous (memory inertia)

• guest can’t predict future needs

• good pages are evicted along with the bad

• don’t know how much/fast to balloon

• Too much or too fast

� thrashing or the dreaded OOM killer

VMM Physical Memory Management

in the presence of ballooning

Page 24: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

The Virtualized Physical Memory

Resource Optimization Challenge

Optimize, across time, the distribution of machine memory among a maximal set of virtual machines by:

• measuring the current and future memory need of each running VM and

• reclaiming memory from those VMs that have an excess of memory and either:• providing it to VMs that need more memory or

• using it to provision additional new VMs.

• without suffering a significant performance penalty

…..This IS a hard problem!!!

Page 25: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Why this IS a hard problem!

Summary

• OS’s use as much memory as they are given

• but cannot predict the future so often guess wrong

• and often much memory owned by an OS is wasted

• Xen leaves large amounts of memory fallow

• fixed partitioning results in fragmentation

• migration requires fallow memory to succeed

• Ballooning helps but:

• can’t predict future memory needs of guests

• memory has inertia

• the price of incorrect guesses can be dire

���� NEED A NEW APPROACH TO VIRTUALIZED PHYSICAL MEMORY MANAGEMENT!!

Page 26: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Agenda

• Motivation and Challenge

• Overview of Physical Memory Management

• Transcendent Memory Overview

• Transcendent Memory In Action

• Status, Futures, etc.

Page 27: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory

creating the transcendent memory pool

• Step 1a: reclaim all fallow memory

• Step 1b: reclaim wasted guest

memory (e.g. via ballooning)

• Step 1c: collect it all into a pool

Transcendentmemorypool

guest

guest

guest

guest

fallow

fallow

fallow

Page 28: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory

creating the transcendent memory pool

• Step 2: provide indirect

access, strictly controlled by

the hypervisor and dom0

control

Transcendentmemorypool

guest

guest

guest

guest

data

data

data

data

control

Page 29: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory

API characteristics

Transcendent memory API

• paravirtualized (lightly)

• narrow

• well-specified

• operations are:

• synchronous

• page-oriented (one page per op)

• copy-based

• multi-faceted

• extensibleTranscendentmemorypool

guest guest

Page 30: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory

four different subpool types ���� four different uses

inter-domain

shared

memory?

server-side cluster

filesystem cache?

� “shared hcache”

shared

Fast swap

“device”!!

� “hswap”

“second-chance”

clean-page cache!!

� “hcache”

private

persistentephemeral

Implemented and working today (Linux + Xen)

In development

Under investigation

eph-em-er-al, adj., … transitory, existing only briefly, short-lived (i.e. NOT persistent)

Page 31: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

• Requirements• guest OS must be paravirtualized

• 64-bit hypervisor and CPU

• Workload:• should exert memory pressure in at least one guest

• memory pressure in multiple guests should vary across time

• For best results:• dom0 should be configured with a fixed memory size

• guest should have a (virtual) swap disk configured

• Complementary to:• feedback-directed ballooning

• transparent content-based page sharing

Transcendent memory

caveats

Page 32: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Agenda

• Motivation and Challenge

• Overview of Physical Memory Management

• Transcendent Memory Overview

• Transcendent Memory In Action

• private-ephemeral pool � “hcache”*

• shared-ephemeral pool � “shared hcache”

• private-persistent pool � “hswap”*

• Status, Future, etc.

* called “precache” and “preswap” for Linux

Page 33: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

hcache

• a second-chance clean

page cache for a guest

• “put” clean pages only

• “get” only valuable pages

• pages eventually are evicted

• coherency managed by guest

• exclusive cache semantics

inter-domain

shared memory?

server-side cluster

filesystem cache?

� “shared hcache”

shared

Fast swap

“device”!!

� “hswap”

“second-chance”

clean-page cache!!

� “hcache”

private

persistentephemeral

Transcendent Memory Pool types

Transcendentmemory pool

(private+ephemeral)

guest

“put”

“get”

Page 34: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

hcache (with compression)

• Compression• Option (per-domain)

• nominally doubles available memory

• performance-space tradeoff

guest

Page 35: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

hcache (multiple guests)

• second-chance page cache

for multiple guests

• Need “memory scheduler”:

• global admission/eviction policy:

• LRU queue, or

• weight balanced (future)

private ephemeraltmem pool #1

guest

guest

private ephemeraltmem pool #2

Page 36: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

shared hcache (for clustering)

• guests sharing a clustered filesystem• non-exclusive

• LFU instead of LRU

• compression optional

� a server-side disk cache!

Clusteredfilesystem

inter-domain

shared memory?

server-side cluster

filesystem cache?

� “shared hcache”

shared

Fast swap

“device”!!

� “hswap”

“second-chance”

clean-page cache!!

� “hcache”

private

persistentephemeral

Transcendent Memory Pool types

SHARED ephemeraltmem pool

guest

guest

Page 37: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

hswap

• over-ballooned guests

experiencing unexpected

memory pressure have an

emergency swap disk

• much faster than swapping

• persistent (“dirty”) pages OK

• prioritized higher than hcache

• limited by domain’s maxmem

Transcendent Memory Pool types

inter-domain

shared memory?

server-side cluster

filesystem cache?

� “shared hcache”

shared

Fast swap

“device”!!

� “hswap”

“second-chance”

clean-page cache!!

� “hcache”

private

persistentephemeral

Page 38: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Agenda

• Motivation and Challenge

• Overview of Physical Memory Management

• Transcendent Memory Overview

• Transcendent Memory In Action

• Status, Future, etc.

Page 39: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Current Status

• hcache and hswap fully working

• shared hcache soon

• xen-side patch ready for inclusion in xen-unstable

• ~3K line patch, but low impact on existing code

• enabled with xen boot option (off by default)

• “technology preview”

• goal: broader community usage (3.4?)

• linux-side patch ready

• low impact on existing code

• 2.6.18-xen version ready for inclusion in Xen-linux tree

• 2.6.28 version working

Page 40: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Future Work

• finish “shared hcache” work (ocfs2)

• shared-persistent pool investigation

• inter-domain communication?

• real world performance measurement/analysis

• identify tuning opportunities (e.g. scaleability) and repeat

• finish “memory scheduler”

• tmem for:

• native Linux?

• Linux containers?

• KVM?

• Hvm domains?

Page 41: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Acknowledgements

• Chris Mason (Oracle)

• Linux vfs changes for hcache

• Zhigang Wang (Oracle)

• Xen tools (xm + libxc) code

• Kurt Hackel (Oracle), various HP friends, Ian, Keir, Jeremy

• design feedback along the way

Page 42: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

For more information

http://oss.oracle.com/projects/tmem

Page 43: XS Oracle 2009 Transcendent Memory

<Insert Picture Here>

Transcendent

Memory on XenSpeaker: Dan Magenheimer

Oracle Corporation

2009

Page 44: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Backup Slides

Page 45: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent Memory API

overview (API v0.0.1)

Page 46: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory API

op overview (API v0.0.1)

Two classes of operations:

• Create a pool

Syntax: pool_id = tmem_new_pool(uuid, flags)

• Operate on a created pool

Generic syntax:

retval = tmem_op(handle,pfn[,ofs1,ofs2,len])

Page 47: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory API

pool creation (API v0.0.1)

inter-domain

shared

memory?

server-side cluster

filesystem cache?

shared

Fast swap

“device”!!

�“hswap”

“second-chance”

clean- page cache!!

�“hcache”

private

persistentephemeral

Implemented and working today (Linux + Xen)

Under investigation

Syntax: pool_id = tmem_new_pool(uuid, flags)

flags: private vs. shared, ephemeral vs. persistent, page size, API version, … ???

uuid: 128-bit “share name” (for shared pools, ignored for private pools)

Page 48: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent Memory API

what is a “handle”?? (API v0.0.1)

• The “handle” used in previous slides is actually a

three-element “handle-tuple” consisting of:

• a 32-bit pool-id (obtained from tmem_new_pool())

• a 64-bit object-id

• a 32-bit page-id

• In filesystem-like usage:

• pool-id � one per filesystem

• object-id � inode

• page-id � page index into a file

retval = tmem_op(handle,pfn)� (is actually)

retval = tmem_op(pool_id,object_id,page_id,pfn)

Page 49: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent Memory API

API operations (API v0.0.1)

• tmem_new_pool(uuid,flags)

• tmem_destroy_pool(pool_id)

• tmem_put_page(pool_id,object_id,page_id,pfn)

• tmem_get_page(pool_id,object_id,page_id,empty_pfn)

• tmem_flush_page(pool_id,object_id,page_id)

• tmem_flush_object(pool_id,object_id)

• tmem_read(pool_id,object_id,page_id,pfn,

offset1,offset2,len)

• tmem_write(pool_id,object_id,page_id,pfn,

offset1,offset2,len

• tmem_xchg(pool_id,object_id,page_id,pfn,

offset1,offset2,len)

• tmem_control(TBD…)

Page 50: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent Memory API

important semantic details (v0.0.1)

• get_page on a private+ephemeral pool is destructive (auto-flush)

• implements exclusive cache semantics

• no serialization guarantees are provided for SMP VMs

• clients must ensure coherency with their own caches/data stores but implementation provides following guarantees:• put/put/get (aka “dup put”) coherency

tmem_put_page(ABC,D1);tmem_put_page(ABC,D2);

tmem_get_page(ABC,E);

E may never contain the data from D1.

(implies that on persistent pools, dup put must never fail)

• get/get coherency

tmem_get_page(ABC,E);tmem_get_page(ABC,E);

If the first get fails, the second must also fail

• all flush operations must always succeed

• return values: >=0 means success, < 0 failure (errno)

• see spec for more information

Page 51: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory

hcache performance(smaller is better)

0

10

20

30

40

50

60

70

seconds

pcpu=2

vcpu=2

pcpu=4

vcpu=2

pcpu=4

vcpu=4

256MB w/hcache 256MB no hcache

1024MB no hcache 2048MB no hcache

0

20

40

60

80

100

disk reads (K)

pcpu=2

vcpu=2

pcpu=4

vcpu=2

pcpu=4

vcpu=4

256MB w/hcache 256MB no hcache

1024MB no hcache 2048MB no hcache

Benchmark: Linux compile, cold page cache, pre-caching enabled (ccache)

Page 52: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

Transcendent memory

hcache compensates for

underprovisioned memory

0

20

40

60

80

100

120

seconds

pcpu=4 vcpu=4

128MB w/hcache 128MB no hcache

256MB no hcache 1024MB no hcache

Benchmark: Linux compile, warm page cache, pre-caching disabled

0

100

200

300

400

500

600

disk reads (K)

pcpu=4 vcpu=4

128MB w/hcache 128MB no hcache

256MB no hcache 1024MB no hcache

Page 53: XS Oracle 2009 Transcendent Memory

Transcendent Memory on Xen (Xen Summit 2009) - Dan Magenheimer

hcache (multiple domains + compressed)

• shared compressed

extended page cache for

more than one guestguest

guest