message analysis-guided allocation and low-pause incremental garbage collection in a concurrent...

Message Analysis-Guided Allocation and Low-Pause

Incremental Garbage Collection in a

Concurrent Language

KonstantinosSagonas

JesperWilhelmsson

Uppsala University, Sweden

Goals of this work

Efficiently implement concurrency

through asynchronous message-

passing

Memory management with real-time

characteristicso Short stop-times

o High mutator utilization

Design for multithreading

Our context: Erlang

Designed for highly concurrent applications

Soft Real-Time

Light-weight processes

No destructive updates

Data types: atoms, numbers, PIDs, tuples,

cons cells (lists), binariesheapdata

Our context: the Erlang/OTP system

Industrial-strength implementation

Used in embedded applications

Three memory architectures: [ISMM’02]

o Private

o Shared

o Hybrid

Private heaps

O(|message|)

Private heaps

Garbage collection is a private business

Fast memory reclamation of terminated processes

Shared heap

Global synchronization

Longer stop-times

No fast reclamation of process-local data

Hybrid architecture

Message area

Process-localheaps

Big objects area

Several possible methodso User annotationso Dynamic monitoring [Petrank et al ISMM’02]o Static analysis guided allocation

Allocating messages in themessage area

Static message analysis [SAS’03]

Similar to escape analysis

Allocation is process-local by default

o Possible messages allocated on message

o Copy on demand

Analysis is quite precise

o Typically finds 99% of all messages

Process-local heapsPrivate business: No synchronization

required

Message areaTwo generationsCopying collector in young generation

o Fast allocation

Mark-and-sweep in old generationo Prevents repeated copying of old objects

Garbage Collection in Hybrid Arch.

GC of the message area is a bottleneck

1. Generational process scanning

2. Remembered set in local heaps

The root-set for the message area consistsof all stacks and process-local heaps

This is not enough...We need an incremental collector

in the Message Area!

Properties of incremental collector

No overhead on mutator

No space overhead on heap objects

Short stop-times

High mutator utilization

Oldgeneration

Organization of the Message Area

Black-map

Younggeneration

NurseryFrom-space

Nursery and from-space always have a constant size,

(=100k words)

Storage area for forwarding pointers.

Size bound by (currently = )

List of arbitrary sized areasFree-list, first-fit allocation

Bit-array used to mark objects in

mark-and-sweep

Nlimit

allocationlimit

Nursery

Organization of the Message Area

Incremental collector

Two approaches to choose from:

Work-based

Reclaim n live words each step

Time-based

A step takes no more than t ms

n and t are user-specified

Work-based collection

The mutator wants to allocate need words

reclaim = max( n , need )Nlimit

allocationlimit

Allocation limit = Ntop + reclaim

Time-based collection

1. User annotations (as in Metronome)

2. Dynamic worst-case calculation

How much can the mutator allocate?

How much live data is there?

Time-based collection

GC = reclaimed after GC – reclaimed before GC

GCsteps = – reclaimed after GC

wM =Nfree

GCsteps

Nlimit

allocationlimit

Allocation limit = Ntop + wM

Collecting the Message Area

P1 P2 P3

FwdNurseryFromspace

Process Queue

P1 P2 P3

FwdFromspaceNursery

Process Queue

P1 P2 P3

FwdFromspaceNursery

Process Queue

P1 P2 P3

FwdFromspaceNursery

Process Queue P1

FwdFromspaceNursery

Process Queue P1

FwdFromspaceNursery

Process Queue P1

FwdFromspaceNursery

allocationlimit

Cheap write barrier

Link receiver to a list in the send operation

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

Process Queue

FwdFromspaceNursery

allocationlimit

FwdFromspaceNurseryallocationlimit

Performance evaluation: Settings

Intel Xeon 2.4 GHz, 1GB RAM, Linux

Start with small process-local heaps(233 words, grows when needed)

Measure active CPU timeo using hardware performance monitors

Performance evaluation: Benchmarks

Mnesia – Distributed database system1,109 processes 2,892,855 messages

Yaws – HTTP Web server420 processes 2,275,467 messages

Adhoc – Data mining application137 processes 246,021 messages

Stop-times – Time-based

Mnesia

Yaws t = 1ms

Stop-times – Work-based

Adhoc Yaws

n = 2 words

Mean: 3Geo. Mean: 2

Mean: 9Geo. Mean: 1

Stop-times – Work-based

Adhoc Yaws

n = 100 words

Mean: 53Geo. Mean: 46

Mean: 268Geo. Mean: 36

Time (s) Time (s)

Bench-mark

n = 2MA GC

n = 100

n = 1000

Non-Inc.MA GC

Mnesia 182 164 156 88

Yaws 373 374 242 153

Adhoc 244 203 78 27

Message area total GC timesincremental vs. non-incremental

Times in ms

Bench-mark

MutatorLocal GC

MAn = 2

MAn = 100

Mnesia 52,906 4,439 182 164 156

Yaws237,62

911,72

8373 374 242

Adhoc 61,045 8,194 244 203 78

Runtimes – Incremental

Times in ms

Minimum Mutator Utilization

The fraction of time that the mutatorexecutes in any time window[Cheng & Blelloch PLDI 2001]

Mutator Utilization – Work-based

Yaws n = 100 words

Concluding Remarks

Memory allocator is guided by the intended use of data

Incremental Garbage CollectorHigh mutator utilizationSmall overhead on total runtimeNo mutator overheadSmall space overhead

Really short stop-times!

Runtimesincremental vs. non-incremental

Times in ms

Bench-mark

Inc.Mutator

Non-Inc.Mutator

Mnesia 52,906 53,276

Yaws237,62

9240,985

Adhoc 61,045 61,578

Total GC timesincremental vs. non-incremental

Times in ms

Bench-mark

Inc. Local GC

Non-Inc.Local GC

Mnesia 4,439 4,487

Yaws 11,728 11,359

Adhoc 8,194 7,848

Mutator Utilization – Time-based

Mnesia

Yaws t = 1ms

message analysis-guided allocation and low-pause incremental garbage collection in a concurrent...

message areacopy

message area consistsof

processlocal heapsthis

local heapsthe root

wordsstorage area

incremental garbage

based collectionthe

messagesgarbage collection

Documents

konstantinos vasileiou cv

bruxaria noturna - konstantinos

antologia - konstantinos kaváfis

konstantinos' experience

wilhelmsson, mats sept 29 11

konstantinos tsamis cv

hedonic modeling mats wilhelmsson center for banking and...

hipe version 1.0 kostis sagonas uppsala university

konstantinos argyros

mats wilhelmsson cern april 2004 - desy water cooling...

niklas wilhelmsson eparticipation-26.11.2013

nocturnicon - konstantinos

reducing interprocess communication overhead in concurrent...

konstantinos kai theoni

jonas wilhelmsson business development © telia mobile 1999...

konstantinos convocando-espiritus

konstantinos liakeas

logic programming implementation part i: the wam kostis...

c ommunication to the patient kajsa wilhelmsson 05.03.2009...

kostis sagonas: cool tools for modern erlang program...