message analysis-guided allocation and low-pause incremental garbage collection in a concurrent...

Post on 03-Jan-2016

225 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Message Analysis-Guided Allocation and Low-Pause

Incremental Garbage Collection in a

Concurrent Language

KonstantinosSagonas

JesperWilhelmsson

Uppsala University, Sweden

Goals of this work

Efficiently implement concurrency

through asynchronous message-

passing

Memory management with real-time

characteristicso Short stop-times

o High mutator utilization

Design for multithreading

Our context: Erlang

Designed for highly concurrent applications

Soft Real-Time

Light-weight processes

No destructive updates

Data types: atoms, numbers, PIDs, tuples,

cons cells (lists), binariesheapdata

Our context: the Erlang/OTP system

Industrial-strength implementation

Used in embedded applications

Three memory architectures: [ISMM’02]

o Private

o Shared

o Hybrid

Stack

Heap

Private heaps

P P

Private heaps

P P

O(|message|)

copy

Private heaps

P P

Garbage collection is a private business

Fast memory reclamation of terminated processes

O(1)

Shared heap

P P

Global synchronization

Longer stop-times

No fast reclamation of process-local data

Hybrid architecture

P P

Message area

Process-localheaps

Big objects area

Several possible methodso User annotationso Dynamic monitoring [Petrank et al ISMM’02]o Static analysis guided allocation

Allocating messages in themessage area

Static message analysis [SAS’03]

Similar to escape analysis

Allocation is process-local by default

o Possible messages allocated on message

area

o Copy on demand

Analysis is quite precise

o Typically finds 99% of all messages

Process-local heapsPrivate business: No synchronization

required

Message areaTwo generationsCopying collector in young generation

o Fast allocation

Mark-and-sweep in old generationo Prevents repeated copying of old objects

Garbage Collection in Hybrid Arch.

GC of the message area is a bottleneck

1. Generational process scanning

2. Remembered set in local heaps

The root-set for the message area consistsof all stacks and process-local heaps

This is not enough...We need an incremental collector

in the Message Area!

Properties of incremental collector

No overhead on mutator

No space overhead on heap objects

Short stop-times

High mutator utilization

Oldgeneration

Organization of the Message Area

Fwd

Black-map

Younggeneration

NurseryFrom-space

Nursery and from-space always have a constant size,

(=100k words)

Storage area for forwarding pointers.

Size bound by (currently = )

List of arbitrary sized areasFree-list, first-fit allocation

Bit-array used to mark objects in

mark-and-sweep

Nlimit

Ntop

allocationlimit

Nursery

Organization of the Message Area

Incremental collector

Two approaches to choose from:

Work-based

Reclaim n live words each step

Time-based

A step takes no more than t ms

n and t are user-specified

Work-based collection

The mutator wants to allocate need words

reclaim = max( n , need )Nlimit

Ntop

allocationlimit

Allocation limit = Ntop + reclaim

Time-based collection

1. User annotations (as in Metronome)

2. Dynamic worst-case calculation

How much can the mutator allocate?

How much live data is there?

Time-based collection

GC = reclaimed after GC – reclaimed before GC

GCsteps = – reclaimed after GC

GC

wM =Nfree

GCsteps

Nlimit

Ntop

allocationlimit

Allocation limit = Ntop + wM

Collecting the Message Area

P1 P2 P3

FwdNurseryFromspace

Process Queue

Collecting the Message Area

P1 P2 P3

FwdFromspaceNursery

Process Queue

Collecting the Message Area

P1 P2 P3

FwdFromspaceNursery

Process Queue

Collecting the Message Area

P1 P2 P3

FwdFromspaceNursery

P1

Process Queue P1

Collecting the Message Area

P2 P3

FwdFromspaceNursery

Process Queue P1

Collecting the Message Area

P2 P3

FwdFromspaceNursery

Process Queue P1

Collecting the Message Area

P2 P3

FwdFromspaceNursery

allocationlimit

Cheap write barrier

Link receiver to a list in the send operation

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

allocationlimit

P1

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

P1

allocationlimit

Process Queue

Collecting the Message Area

P2 P3

FwdFromspaceNursery

allocationlimit

P1

Collecting the Message Area

P2 P3

FwdFromspaceNurseryallocationlimit

P1

Performance evaluation: Settings

Intel Xeon 2.4 GHz, 1GB RAM, Linux

Start with small process-local heaps(233 words, grows when needed)

Measure active CPU timeo using hardware performance monitors

Performance evaluation: Benchmarks

Mnesia – Distributed database system1,109 processes 2,892,855 messages

Yaws – HTTP Web server420 processes 2,275,467 messages

Adhoc – Data mining application137 processes 246,021 messages

Stop-times – Time-based

Mnesia

Yaws t = 1ms

Stop-times – Work-based

Adhoc Yaws

n = 2 words

Mean: 3Geo. Mean: 2

Mean: 9Geo. Mean: 1

Stop-times – Work-based

Adhoc Yaws

n = 100 words

Mean: 53Geo. Mean: 46

Mean: 268Geo. Mean: 36

Time (s) Time (s)

Bench-mark

n = 2MA GC

n = 100

MA GC

n = 1000

MA GC

Non-Inc.MA GC

Mnesia 182 164 156 88

Yaws 373 374 242 153

Adhoc 244 203 78 27

Message area total GC timesincremental vs. non-incremental

Times in ms

Bench-mark

MutatorLocal GC

MAn = 2

MAn = 100

MAn =

1000

Mnesia 52,906 4,439 182 164 156

Yaws237,62

911,72

8373 374 242

Adhoc 61,045 8,194 244 203 78

Runtimes – Incremental

Times in ms

Minimum Mutator Utilization

The fraction of time that the mutatorexecutes in any time window[Cheng & Blelloch PLDI 2001]

Mutator Utilization – Work-based

Adhoc

Yaws n = 100 words

Concluding Remarks

Memory allocator is guided by the intended use of data

Incremental Garbage CollectorHigh mutator utilizationSmall overhead on total runtimeNo mutator overheadSmall space overhead

Really short stop-times!

Runtimesincremental vs. non-incremental

Times in ms

Bench-mark

Inc.Mutator

Non-Inc.Mutator

Mnesia 52,906 53,276

Yaws237,62

9240,985

Adhoc 61,045 61,578

Total GC timesincremental vs. non-incremental

Times in ms

Bench-mark

Inc. Local GC

Non-Inc.Local GC

Mnesia 4,439 4,487

Yaws 11,728 11,359

Adhoc 8,194 7,848

Mutator Utilization – Time-based

Mnesia

Yaws t = 1ms

top related