taming dynamic memory...pool allocator - stl containers vector only if chunk sizes match vector size...

119
1/56 Taming Dynamic Memory An Introduction to Custom Allocators Andreas Weis BMW AG code::dive, November 7, 2018

Upload: others

Post on 18-Aug-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

156

Taming Dynamic MemoryAn Introduction to Custom Allocators

Andreas Weis

BMW AG

codedive November 7 2018

256

About me

ComicSansMS

DerGhulbus

Co-organizer of the Munich C++ User Group

Currently working as a Software Architect for BMW

356

Overview

Whatrsquos wrong with global new and delete

Local allocators

Alternative allocation strategies

Allocator support in C++

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 2: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

256

About me

ComicSansMS

DerGhulbus

Co-organizer of the Munich C++ User Group

Currently working as a Software Architect for BMW

356

Overview

Whatrsquos wrong with global new and delete

Local allocators

Alternative allocation strategies

Allocator support in C++

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 3: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

356

Overview

Whatrsquos wrong with global new and delete

Local allocators

Alternative allocation strategies

Allocator support in C++

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 4: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 5: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 6: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 7: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

456

General purpose allocator

auto p1 = allocate(3)

auto p2 = allocate(4)

auto p3 = allocate(1)

auto p4 = allocate(3)

auto p5 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 8: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 9: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 10: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 11: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

556

General purpose allocator

deallocate(p2)

auto p6 = allocate(2)

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 12: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 13: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

656

Fragmentation

auto p7 = allocate(4)

Runtime Error

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 14: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 15: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 16: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 17: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

756

Coalescing

deallocate(p4)

deallocate(p3)

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 18: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

756

Coalescing

deallocate(p4)

deallocate(p3)

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 19: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

856

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state

Reasoning about allocator behavior requires global knowledge of the whole programThe singular resource global allocator is a potential bottleneck

Itrsquos not just about performance

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 20: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

956

From Global to Local

auto p1 = allocate (42)

deallocate(p1)

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 21: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1056

From Global to Local

Allocator alloc

auto p1 = allocallocate (42)

allocdeallocate(p1)

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 22: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1156

Problems with default allocator

Complex runtime behavior

What is the maximum memory usageWhat is the worst-case execution time for an allocation or deallocation

Shared global state X

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 23: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 24: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 25: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 26: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1256

Monotonic Allocator

auto p1 = monotallocate(3)

auto p2 = monotallocate(2)

auto p3 = monotallocate(4)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 27: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 28: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 29: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 30: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 31: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1356

Monotonic Allocator

monotdeallocate(p2)

monotdeallocate(p1)

monotdeallocate(p3)

auto p4 = allocate(2)

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 32: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 33: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 34: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 35: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1456

Monotonic Allocator - Reclamation

monotdeallocate(p4)

monotrelease()

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 36: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1556

Monotonic Allocator

Deterministic runtime cost

Extremely efficient

No fragmentation

Easy to implement

Trivial to make thread-safe

But

Memory can only be reclaimed all at once

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 37: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1656

Where is this actually useful

Frames in a video game

Handling of a single event in an event-driven system

Cyclic execution in a real-time system

Containers that are initialized but not changed after

static state - Objects that will never be destroyed

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 38: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 39: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1756

Monotonic Allocator - stdvector

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 40: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1756

Monotonic Allocator - stdvector

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 41: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1856

Monotonic Allocator - STL containers

vector should reserve final size upfront

list and map work fine but deleted elements are not reclaimed individually

deque works really well

unordered map has same problem as vector when growing but the load-factortriggered rehash is less predictable

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 42: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 43: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 44: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1956

unordered map

Exact layout depends on hash function and inserted values

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 45: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

1956

unordered map

Exact layout depends on hash function and inserted values

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 46: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 47: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 48: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2056

Stack Allocator

monotdeallocate(p3)

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 49: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2056

Stack Allocator

monotdeallocate(p3)

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 50: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2156

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 51: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 52: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 53: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2256

Stack Allocator

monotdeallocate(p3)

p3 == top Xtop =

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 54: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2256

Stack Allocator

monotdeallocate(p3)

p3 == top X

top =

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 55: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 56: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 57: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2356

Stack Allocator

Strict LIFO-ordering of allocations and deallocations

No way for the implementation to check whether the deallocation order is correct

Free-pointer is reset to the same pointer passed to the deallocate call

Padding bytes may be lost to internal fragmentation

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 58: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2456

Padding

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 59: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2456

Padding

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 60: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2556

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 61: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2656

Alignment

0xdeadbeef =

d e a d b e e f

1101 1110 1010 1101 1011 1110 1110 1111

No alignment (1-byte aligned)

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 62: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2756

Alignment

0xdeadbeef =

d e a d b e e c

1101 1110 1010 1101 1011 1110 1110 1100

4-byte aligned

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 63: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2856

Alignment

0xdeadbeef =

d e a d b e e 8

1101 1110 1010 1101 1011 1110 1110 1000

8-byte aligned

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 64: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

2956

Alignment

Alignment refers to the least-significant bits of the object address being 0

Alignment requirements are always specified in powers of 2

Each built-in C++ type has a natural alignment requirement (typicallyalignof(T) == sizeof(T))

This is why structs sometimes insert padding bytes between members

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 65: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3056

Alignment

Default allocator typically returns addresses aligned to alignof(max align t)which is big enough for all built-in types

Users may extend the alignment requirement for custom data types usingalignas

void allocate(std size_t bytes

std size_t alignment )

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 66: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3156

Padding

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 67: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 68: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 69: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 70: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 71: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 72: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 73: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 74: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 75: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3256

Monotonic Allocator - Extensions

extpooldeallocate(p2)

extpooldeallocate(p3)

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 76: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3356

Monotonic Allocator - Extensions

Auxiliary data structure required

Runtime cost of deallocation now linear in number of allocations (amortized O(1))

Auxiliary nodes have their own alignment requirements

Where to store the auxiliary nodes

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 77: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3456

Monotonic Allocator - Extensions

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 78: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3456

Monotonic Allocator - Extensions

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 79: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3556

Internal or External

External headers have better cache behavior when iterating the list

External headers might have stricter alignment requirements than data

Internal headers have better cache behavior when adjacent data is hot

Internal headers require managed memory to be readable (think GPUs)

Where does the storage for external headers come from Same buffer Differentbuffer How big

rArr No easy answers

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 80: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3656

The Bottom Line

Even seemingly simple extensions get complicated very quickly

Donrsquot try to increase generality through clever extensionsOnly consider modifications if itrsquos a perfect fit for your use case

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 81: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3756

But what if I need to reclaim memory

minusrarr Pool Allocator

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 82: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3856

Pool Allocator

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 83: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 84: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 85: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 86: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 87: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 88: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

3956

Pool Allocator

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

auto p3 = poolallocate(3)

pooldeallocate(p1)

auto p4 = poolallocate(1)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 89: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 90: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 91: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 92: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 93: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 94: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4056

Pool Allocator - Reclamation

auto p1 = poolallocate(2)

auto p2 = poolallocate(4)

pooldeallocate(p1)

pooldeallocate(p2)

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 95: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4156

Pool Allocator - Diffusion

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 96: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4256

Pool Allocator

Deterministic runtime cost

No external fragmentation

Easy to make thread-safe

But

Cannot serve allocations bigger than chunk size

High waste through internal fragmentation if sizes of objects vary a lot

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 97: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4356

Pool Allocator - STL containers

vector only if chunk sizes match vector size

list and map are a perfect fit as the size of each node is known beforehand(though this knowledge is implementation-specific)

Similar for deque

unordered map is again complicated

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 98: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4456

But what if I do need different sizes

minusrarr Multipool Allocator

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 99: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 100: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 101: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 102: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 103: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4556

Multipool Allocator

auto p1 = multipoolallocate(6)

auto p2 = multipoolallocate(2)

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 104: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4656

Multipool Allocator

Very powerful allocator

Runtime is still deterministic if number of pools is known beforehand

Maximum amount of waste through internal fragmentation can be controlledprecisely

Difficult to set up How many pools do I need What chunk sizes What poolsizes

Solid building block for a general purpose allocator

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 105: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 106: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4756

Allocator support in C++

stdvectorltT AllocatorltTgtgt v

vpush back()

This is not the class allocating the memory

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 107: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 108: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4856

Allocator support in C++

Historically C++ used Allocators to abstract over different models of addressingmemoryAs such Allocators in C++ are ldquostatelessrdquo

In C++ an Allocator is merely a handle to a memory resource1

1Arthur OrsquoDwyer - An Allocator is a Handle to the Heap

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 109: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 110: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 111: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 112: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

4956

Allocator support in C++

stdpmrmemory resourceamp mr =

stdvectorltT stdpmrpolymorphic allocatorgt v(ampmr)

stdpmrvectorltTgt v(ampmr)

vpush back()

Enable custom allocators for the object

Pass a memory resource to handle allocationdeallocation

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 113: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5056

C++ Memory Resources2

stdpmrmemory resource - Abstract base class for all resources that can bewrapped in a stdpmrpolymorphic allocator

stdpmrnew delete resource - Global allocator

stdpmrmonotonic buffer resource - Monotonic allocator

stdpmrunsynchronized pool resource synchronized pool resource

- Multipool

stdpmrnull memory resource - Allocation always fails

2Pablo Halpern - Allocators The Good Parts

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 114: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5156

Chaining

explicit monotonic_buffer_resource(

stdpmr memory_resource upstream )

Each memory resource has an upstream counterpart

If the resource runs out of memory it tries to allocate more memory from upstream

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 115: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5256

Chaining

Possible uses of Chaining

Fixed-size vs dynamic storage for allocators

Combination of different allocation strategies

Injection points for special purpose allocators for debugging and profiling

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 116: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5356

Therersquos no universal interface for allocators

Are size and alignment parameters passed to deallocate

Is realloc supported

How are out-of-memory errors reported

Is extended alignment supported

What is the return value for an allocation of size 0

Different memory regions for internal data structures and allocated memory

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 117: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5456

Donrsquot underestimate the global allocator

Competitive performance in the general case

Security features (ASLR secure erase of freed memory)

Debugging amp Profiling (Valgrind Windows Debug Runtime)

Cache Coloring

Local allocators are no free lunch

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 118: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5556

Wrapping up

No one-size-fits-all mdash Each allocator has its Achilles heel

Global allocator is a good solution for the general case

But you can do better with special allocators for special use cases in terms ofperformance3 as well as reliability

C++ has good support for local allocators but the terminology is a bit off

Different libraries have different concepts of allocators

No free lunch You need to understand your use case before you can chose theright allocator

3John Lakos - Local (Arena) Allocators

5656

Thanks for your attention

Page 119: Taming Dynamic Memory...Pool Allocator - STL containers vector only if chunk sizes match vector size list and map are a perfect t, as the size of each node is known beforehand (though

5656

Thanks for your attention