juggling with bits and bytes - how apache flink operates on binary data

1

Juggling with Bits and BytesHow Apache Flink operates on binary data

Fabian [email protected] @fhueske

2

Big Data frameworks on JVMs

• Many (open source) Big Data frameworks run on JVMs– Hadoop, Drill, Spark, Hive, Pig, and ...– Flink as well

• Common challenge: How to organize data in-memory?– In-memory processing (sorting, joining, aggregating)– In-memory caching of intermediate results

• Memory management of a system influences– Reliability– Resource efficiency, performance & performance predictability– Ease of configuration

3

The straight-forward approach

Store and process data as objects on the heap• Put objects in an Array and sort it

A few notable drawbacks• Predicting memory consumption is hard

– If you fail, an OutOfMemoryError will kill you!

• High garbage collection overhead– Easily 50% of time spend on GC

• Objects have space overhead– At least 8 bytes for each (nested) object! (Depends on arch)

4

FLINK’S APPROACH

5

Flink adopts DBMS technology

• Allocates fixed number of memory segments upfront• Data objects are serialized into memory segments• DBMS-style algorithms work on binary representation

6

Why is that good?

• Memory-safe execution– Used and available memory segments are easy to count

• Efficient out-of-core algorithms– Memory segments can be efficiently written to disk

• Reduced GC pressure– Memory segments are never deallocated– Data objects are short-lived or reused

• Space-efficient data representation

• Efficient operations on binary data

7

What does it cost?

• Significant implementation investment– Using java.util.HashMapvs.– Implementing a spillable hash table backed by byte arrays

and custom serialization stack

• Other systems use similar techniques– Apache Drill, Apache Ignite, Apache Geode

• Apache Spark plans to evolve into a similar direction

8

MEMORY ALLOCATION

9

Memory segments

• Unit of memory distribution in Flink– Fixed number allocated when worker starts

• Backed by a regular byte array (default 32KB)

• R/W access through Java’s efficient unsafe methods

• Multiple memory segments can be concatenated to a larger chunk of memory

10

Memory allocation

11

DATA SERIALIZATION

12

Custom de/serialization stack

• Many alternatives for Java object serialization– Kryo, Apache Avro, Apache Thrift, Protobufs, …

• But Flink has its own serialization stack– Operating on serialized data requires knowledge of layout– Control over layout can improve efficiency of operations– Data types are known before execution

13

Rich & extensible type system

• Serialization framework requires knowledge of types

• Flink analyzes return types of functions– Java: Reflection based type analyzer– Scala: Compiler information

• Rich type system– Atomics: Primitives, Writables, Generic types, …– Composites: Tuples, Pojos, CaseClasses– Extensible by custom types

14

Serializers & comparators

• All types have dedicated de/serializers– Primitives are natively serialized– Writables use their own serialization functions– Generic types use Kryo– …

• Serialization goes automatically through Java unsafe

• Comparators compare and hash objects– On binary representation if possible

• Composite serializers and comparators delegate to serializers and comparators of member types

15

Serializing a Tuple3<Integer, Double, Person>

16

OPERATING ON BINARY DATA

17

Data Processing Algorithms

• Flink’s algorithms are based on RDBMS technology– External Merge Sort, Hybrid Hash Join, Sort Merge Join, …

• Algorithms receive a budget of memory segments

• Operate in-memory as long as data fits into budget– And gracefully spill to disk if data exceeds memory

18

In-Memory Sort – Fill the Sort Buffer

19

In-Memory Sort – Sort the Buffer

20

In-Memory Sort – Read Sorted Buffer

21

SHOW ME NUMBERS!

22

Sort benchmark

• Task: Sort 10 million Tuple2<Integer, String> records– String length 12 chars

• Tuple has 16 Bytes of raw data• ~152 MB raw data

– Integers uniformly, Strings long-tail distributed– Sort on Integer field and on String field

• Input provided as mutable object iterator

• Use JVM with 900 MB heap size– Minimum size to reliable run the benchmark

23

Sorting methods

1. Objects-on-Heap: – Put cloned data objects in ArrayList and use Java’s Collection sort. – ArrayList is initialized with right size.

2. Flink-serialized: – Using Flink’s custom serializers.– Integer with full binary sorting key, String with 8 byte prefix key.

3. Kryo-serialized: – Serialize fields with Kryo. – No binary sorting keys, objects are deserialized for comparison.

• All implementations use a single thread• Average execution time of 10 runs reported• GC triggered between runs (does not go into time)

24

Execution time

25

Garbage collection and heap usage

Objects-on-heap

Flink-serialized

26

Memory usage

• Breakdown: Flink serialized - Sort Integer– 4 bytes Integer– 12 bytes String– 4 bytes String length– 4 bytes pointer– 4 bytes Integer sorting key– 28 bytes * 10M records = 267 MB

Object-on-heap Flink-serialized Kryo-serialized

Sort Integer Approx. 700 MB 277 MB 266 MB

Sort String Approx. 700 MB 315 MB 266 MB

27

WHAT’S NEXT?

28

We’re not done yet!

• Move memory segments to off-heap memory– Smaller JVM, lower GC pressure, easier configuration

• Table API provides full semantics for execution– Use code generation to operate fully on binary data

• Serialization layouts tailored towards operations– More efficient operations on binary data

• …

29

Summary

• Active memory management avoids OOMErrors.

• Highly efficient data serialization stack– Facilitates operations on binary data– Makes more data fit into memory

• DBMS-style operators operate on binary data – High performance in-memory processing – Graceful destaging to disk if necessary

• Read the full story:

http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html

30

http://flink.apache.org@ApacheFlink

Apache Flink

juggling with bits and bytes - how apache flink operates on binary data

Data & Analytics

data serialization

memory allocation

memory consumption

available memory segments

serialized data

memorysafe execution

apache flink

process data