apache storm concepts

Post on 15-Jul-2015

101 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Apache Storm ConceptsAndré Dias

andre.dias@summa.com.br

Highlights

● Background● Storm’s History● Concepts● Integrations● Real Cases● Q&A

Background

BackgroundBig Data V’s

BackgroundBig Data V’s

VolumePetabytes / Terabytes

VelocityReal-time / Near Real-time

VarietySensors, Blog Posts, Logs, Social Networks...

BackgroundData Streaming

BackgroundData Streaming

BackgroundData Streaming

Now Please… I’m in traffic…

● Discovery● Ingest● Process● Persist● Analyze● Expose

BackgroundData Value Chain

● Discovery● Ingest● Process● Persist● Analyze● Expose

BackgroundData Value Chain (Storm Focus)

BackgroundData Value Chain

● Discovery● Ingest● Process● Persist● Analyze● Expose

BackgroundData Value Chain (Ingest)

Stream ProcessingData in Motion

Batch Processing

● Data processing architecture

○ Generic

○ Scalable

○ Fault-tolerant (human/hardware)

● Low latency

BackgroundLambda Architecture (LA)

BackgroundLambda Architecture (LA)

Storm’s HistoryNathan Marz

● Backtype (2008), acquired by Twitter (2011)

● Lambda Architecture Creator

● September, 2011. Storm Creation

● September, 2013. Storm entered to ASF

Why Use It?● Scalable, as Hadoop

● No data loss, reliable

● Fault tolerant

● Language agnostic

● Real-time, real fast

Why Use It?

Storm is TLP !!

Concepts

Concepts

● Tuple● Streams● Spouts● Bolts● Topologies / Trident API● Stream Groupings

ConceptsSpouts

● Source of Streams

● Data Consumers (Ingestion)

● Emits Tuples

ConceptsBolts

● Units of Work to tuples

● Data streaming logic

● Can emit tuples as well

● Data store integration

ConceptsTopology

● Data Streaming Flow Representation

● DAG (Direct Acyclic Graph) of Spouts and

Bolts

● Streaming computation

● Each node as individual task (parallel

execution)

● Stateless

ConceptsTrident API

● Abstraction Layer over low-level Storm API

● More Complex Topologies

● Stateful

● Micro-batch

● High-level API (similar to Pig / Cascading -

Hadoop)

● Message processed at least once

(guaranteed)

ConceptsTrident API - Micro Batch

● Trident Batches

○ are Ordered

ConceptsTrident API - Micro Batch

● Trident Batches

○ can be Partitioned

ConceptsStreaming Groups

● Data Flow Control over

Topologies

Architecture

ArchitectureComponents - Nimbus

● Master node (similar to JobTracker)

● Monitor and distribute the processing

workload across worker nodes

● Stores all its data into Zookeeper

ArchitectureComponents - Supervisor

● Worker node (similar to TaskTracker)

● Monitor and distribute the processing

workload across worker nodes

● Stores all its data into Zookeeper

ArchitectureOverview

● Master-slave approach

● Cluster coordination

(Zookeeper)

● Nimbus HA

Integrations

Real Cases

Collector sensor information to a

Data Lake

Micro-batch user contents, content feeds

and application logs

Real-time user music

recommendations

Q&A

THANK YOU!Use Storm

top related