apach avro

Post on 16-Mar-2018

231 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apach Avro

Overview 1

Data serialization system or /and Data Exchange

Resolve Hadoop Writables : lack of portability

Sharing data

Language independent-schema (JSON)

No need for code generation

Overview 2

Supports schema evolution

Supports compression and splitting

Rich data types and schema

Avro Data types and Schemas 1

null

boolean

int

long

float

double

bytes

Avro Data types and Schemas 2

array

map

record

enum

fixed

union

Avro Data types and Schemas 3

Generic Java mapping

Specific Java mapping

Reflect Java mapping

In-memory Serialization and Deserialization

Specific API (avro-tool)

Datafiles

Schema

Avro object

Marker sync

In binary format

Datafiles

Portability

Portability

Schema resolution (Projection)

Sort Order

Every avro object has ordering rule except records

Comparing works directly on the byte streams

Avro MapReduce

Avro offers many API to run MapReduce on Avro data

Avro MapReduce

Avro MapReduce

Avro MapReduce

Avro MapReduce

Avro MapReduce

Avro MapReduce

Avro MapReduce

Avro Sorting MapReduce

Avro Sorting MapReduce

top related