juggling with bits and bytes - how apache flink operates on binary data

30
Juggling with Bits and Bytes How Apache Flink operates on binary data Fabian Hueske [email protected] @fhueske 1

Upload: fabian-hueske

Post on 28-Jul-2015

943 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Juggling with Bits and Bytes - How Apache Flink operates on binary data

1

Juggling with Bits and BytesHow Apache Flink operates on binary data

Fabian [email protected] @fhueske

Page 2: Juggling with Bits and Bytes - How Apache Flink operates on binary data

2

Big Data frameworks on JVMs

• Many (open source) Big Data frameworks run on JVMs– Hadoop, Drill, Spark, Hive, Pig, and ...– Flink as well

• Common challenge: How to organize data in-memory?– In-memory processing (sorting, joining, aggregating)– In-memory caching of intermediate results

• Memory management of a system influences– Reliability– Resource efficiency, performance & performance predictability– Ease of configuration

Page 3: Juggling with Bits and Bytes - How Apache Flink operates on binary data

3

The straight-forward approach

Store and process data as objects on the heap• Put objects in an Array and sort it

A few notable drawbacks• Predicting memory consumption is hard

– If you fail, an OutOfMemoryError will kill you!

• High garbage collection overhead– Easily 50% of time spend on GC

• Objects have space overhead– At least 8 bytes for each (nested) object! (Depends on arch)

Page 4: Juggling with Bits and Bytes - How Apache Flink operates on binary data

4

FLINK’S APPROACH

Page 5: Juggling with Bits and Bytes - How Apache Flink operates on binary data

5

Flink adopts DBMS technology

• Allocates fixed number of memory segments upfront• Data objects are serialized into memory segments• DBMS-style algorithms work on binary representation

Page 6: Juggling with Bits and Bytes - How Apache Flink operates on binary data

6

Why is that good?

• Memory-safe execution– Used and available memory segments are easy to count

• Efficient out-of-core algorithms– Memory segments can be efficiently written to disk

• Reduced GC pressure– Memory segments are never deallocated– Data objects are short-lived or reused

• Space-efficient data representation

• Efficient operations on binary data

Page 7: Juggling with Bits and Bytes - How Apache Flink operates on binary data

7

What does it cost?

• Significant implementation investment– Using java.util.HashMapvs.– Implementing a spillable hash table backed by byte arrays

and custom serialization stack

• Other systems use similar techniques– Apache Drill, Apache Ignite, Apache Geode

• Apache Spark plans to evolve into a similar direction

Page 8: Juggling with Bits and Bytes - How Apache Flink operates on binary data

8

MEMORY ALLOCATION

Page 9: Juggling with Bits and Bytes - How Apache Flink operates on binary data

9

Memory segments

• Unit of memory distribution in Flink– Fixed number allocated when worker starts

• Backed by a regular byte array (default 32KB)

• R/W access through Java’s efficient unsafe methods

• Multiple memory segments can be concatenated to a larger chunk of memory

Page 10: Juggling with Bits and Bytes - How Apache Flink operates on binary data

10

Memory allocation

Page 11: Juggling with Bits and Bytes - How Apache Flink operates on binary data

11

DATA SERIALIZATION

Page 12: Juggling with Bits and Bytes - How Apache Flink operates on binary data

12

Custom de/serialization stack

• Many alternatives for Java object serialization– Kryo, Apache Avro, Apache Thrift, Protobufs, …

• But Flink has its own serialization stack– Operating on serialized data requires knowledge of layout– Control over layout can improve efficiency of operations– Data types are known before execution

Page 13: Juggling with Bits and Bytes - How Apache Flink operates on binary data

13

Rich & extensible type system

• Serialization framework requires knowledge of types

• Flink analyzes return types of functions– Java: Reflection based type analyzer– Scala: Compiler information

• Rich type system– Atomics: Primitives, Writables, Generic types, …– Composites: Tuples, Pojos, CaseClasses– Extensible by custom types

Page 14: Juggling with Bits and Bytes - How Apache Flink operates on binary data

14

Serializers & comparators

• All types have dedicated de/serializers– Primitives are natively serialized– Writables use their own serialization functions– Generic types use Kryo– …

• Serialization goes automatically through Java unsafe

• Comparators compare and hash objects– On binary representation if possible

• Composite serializers and comparators delegate to serializers and comparators of member types

Page 15: Juggling with Bits and Bytes - How Apache Flink operates on binary data

15

Serializing a Tuple3<Integer, Double, Person>

Page 16: Juggling with Bits and Bytes - How Apache Flink operates on binary data

16

OPERATING ON BINARY DATA

Page 17: Juggling with Bits and Bytes - How Apache Flink operates on binary data

17

Data Processing Algorithms

• Flink’s algorithms are based on RDBMS technology– External Merge Sort, Hybrid Hash Join, Sort Merge Join, …

• Algorithms receive a budget of memory segments

• Operate in-memory as long as data fits into budget– And gracefully spill to disk if data exceeds memory

Page 18: Juggling with Bits and Bytes - How Apache Flink operates on binary data

18

In-Memory Sort – Fill the Sort Buffer

Page 19: Juggling with Bits and Bytes - How Apache Flink operates on binary data

19

In-Memory Sort – Sort the Buffer

Page 20: Juggling with Bits and Bytes - How Apache Flink operates on binary data

20

In-Memory Sort – Read Sorted Buffer

Page 21: Juggling with Bits and Bytes - How Apache Flink operates on binary data

21

SHOW ME NUMBERS!

Page 22: Juggling with Bits and Bytes - How Apache Flink operates on binary data

22

Sort benchmark

• Task: Sort 10 million Tuple2<Integer, String> records– String length 12 chars

• Tuple has 16 Bytes of raw data• ~152 MB raw data

– Integers uniformly, Strings long-tail distributed– Sort on Integer field and on String field

• Input provided as mutable object iterator

• Use JVM with 900 MB heap size– Minimum size to reliable run the benchmark

Page 23: Juggling with Bits and Bytes - How Apache Flink operates on binary data

23

Sorting methods

1. Objects-on-Heap: – Put cloned data objects in ArrayList and use Java’s Collection sort. – ArrayList is initialized with right size.

2. Flink-serialized: – Using Flink’s custom serializers.– Integer with full binary sorting key, String with 8 byte prefix key.

3. Kryo-serialized: – Serialize fields with Kryo. – No binary sorting keys, objects are deserialized for comparison.

• All implementations use a single thread• Average execution time of 10 runs reported• GC triggered between runs (does not go into time)

Page 24: Juggling with Bits and Bytes - How Apache Flink operates on binary data

24

Execution time

Page 25: Juggling with Bits and Bytes - How Apache Flink operates on binary data

25

Garbage collection and heap usage

Objects-on-heap

Flink-serialized

Page 26: Juggling with Bits and Bytes - How Apache Flink operates on binary data

26

Memory usage

• Breakdown: Flink serialized - Sort Integer– 4 bytes Integer– 12 bytes String– 4 bytes String length– 4 bytes pointer– 4 bytes Integer sorting key– 28 bytes * 10M records = 267 MB

Object-on-heap Flink-serialized Kryo-serialized

Sort Integer Approx. 700 MB 277 MB 266 MB

Sort String Approx. 700 MB 315 MB 266 MB

Page 27: Juggling with Bits and Bytes - How Apache Flink operates on binary data

27

WHAT’S NEXT?

Page 28: Juggling with Bits and Bytes - How Apache Flink operates on binary data

28

We’re not done yet!

• Move memory segments to off-heap memory– Smaller JVM, lower GC pressure, easier configuration

• Table API provides full semantics for execution– Use code generation to operate fully on binary data

• Serialization layouts tailored towards operations– More efficient operations on binary data

• …

Page 29: Juggling with Bits and Bytes - How Apache Flink operates on binary data

29

Summary

• Active memory management avoids OOMErrors.

• Highly efficient data serialization stack– Facilitates operations on binary data– Makes more data fit into memory

• DBMS-style operators operate on binary data – High performance in-memory processing – Graceful destaging to disk if necessary

• Read the full story:

http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html

Page 30: Juggling with Bits and Bytes - How Apache Flink operates on binary data

30

http://flink.apache.org@ApacheFlink

Apache Flink