an architect's guide to real time big data systems

30
An Architect's Guide to Building Real Time Big Data Systems Raja SP 10 July 2014, Singapore Lead Architect & Head of Products

Upload: raja-sp

Post on 15-Jan-2015

320 views

Category:

Technology


2 download

DESCRIPTION

Introduction to real time big data, stream computing using Infosphere Streams and Apache Storm. Presented in a Big Data Conference in Singapore, Jul 2014.

TRANSCRIPT

Page 1: An Architect's guide to real time big data systems

An Architect's Guide to Building Real Time Big Data Systems

Raja SP

10 July 2014, Singapore

Lead Architect & Head of Products

Page 2: An Architect's guide to real time big data systems

< Real Time > Big Data

WHY WHAT HOW

Page 3: An Architect's guide to real time big data systems

< Real Time > Big Data

WHY WHAT HOW

Page 4: An Architect's guide to real time big data systems

What is the right time to shoot me ?

Page 5: An Architect's guide to real time big data systems

There is a rhythm in the universe

Page 6: An Architect's guide to real time big data systems

Telecom Marketing Scenario

Cell Utilisation is Low In a Geo-Fence High Balance Frequent Visitor High Data User in the Past

Page 7: An Architect's guide to real time big data systems

What is out there?

Square Kilometers of Arrays Tens of Thousands of Antennae Terabits of Data

Page 8: An Architect's guide to real time big data systems

Security / Intelligence

Page 9: An Architect's guide to real time big data systems

< Real Time > Big Data

WHY WHAT HOW

Page 10: An Architect's guide to real time big data systems

Partitioned Parallel Processing

TASK

TASK

TASK

DATA i

DATA j

DATA k

Pipelined Parallel Processing

DATA TASK i TASK j TASK k

TASKDATA

Hybrid Parallel Processing

DATA TASK i

TASKj

TASK mTASK k

TASK l

Page 11: An Architect's guide to real time big data systems

TASKDATA

Should Data go to Tasks?

Or

Tasks go to Data?

Page 12: An Architect's guide to real time big data systems

DATATASK TASK TASK TASK TASK TASK

Static Data / Data at Rest

DATA DATA DATA TASK DATA DATA DATA

Streaming Data / Data in Motion

Page 13: An Architect's guide to real time big data systems

Streaming Data / Data in Motion Analytics

Page 14: An Architect's guide to real time big data systems

The classic “Word Count” (Stream Computing Version)

Counter

CounterJava Python

Lisp

Python Java C++

Counter

Java

Python Python

Java

Lisp

C++

Java 2 Lisp 1

C++ 1

Python 2

Token Splitte

r

Sink

Page 15: An Architect's guide to real time big data systems

Stream Computing Programming Constructs

Stream Tuple

Operator / Bolt

Counter

CounterJava Python

Lisp

Python Java C++

Counter

Java

Python Python

Java

Lisp

C++

Java 2 Lisp 1

C++ 1

Python 2

Token Splitte

r

Sink

Page 16: An Architect's guide to real time big data systems

Operator

Source Operator

Sink Operator

IBM Infosphere Streams Apache Storms

Bolt

Spout

-------

Composite Topology

Page 17: An Architect's guide to real time big data systems

Composite WordCountApp { Graph Stream< rstring sentence > Sentence = FileSource() {} Stream< rstring word > Word = Split( Sentence ) {} Stream< rstring word, int count > Counts = Count( Word ) {}}

Source Split Count

IBM Infosphere Streams

Sentence Word Counts

Page 18: An Architect's guide to real time big data systems

Apache Storms

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout( ”Source", new RandomSentenceSpout(), 5 );

builder.setBolt( ”Split", new SplitSentence(), 8).shuffleGrouping( "Source” );

builder.setBolt( ”Count", new WordCount(), 12).fieldsGrouping( ”Split", new Fields( "word” ));

Source Split Count

Page 19: An Architect's guide to real time big data systems

IBM Infosphere Streams – Some Operators

Functor Perform tuple-level manipulations (~250 functions)

Filter Remove some tuples from a stream

Aggregate Group and summarize incoming tuples

Sort Impose an order on incoming tuples in a stream

Join Correlate two streams

Punctor Insert window punctuation markers into a stream

Page 20: An Architect's guide to real time big data systems

IBM Infosphere Streams – Some Operators (continued)

Barrier Synchronize tuples from sequence-correlated streams

Pair Group tuples from multiple streams of same type

Split Forward tuples to output streams based on a predicate

ThreadedSplit Distribute tuples over output streams by availability

Union Construct an output tuple from each input tuple

DeDuplicate Suppress duplicate tuples seen within a given time period

Page 21: An Architect's guide to real time big data systems

DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA

Stream Window

Aggregate

Sort

Join

Page 22: An Architect's guide to real time big data systems

< Real Time > Big Data

WHY WHAT HOW

Page 23: An Architect's guide to real time big data systems

Streams Application Development Method

Page 24: An Architect's guide to real time big data systems

Apache Storms

RunTime Components

IBM Infosphere Streams

Instance

Management Host

Application HostNimbus ZooKeeper

Node 1

Node 2

Node 3

Cluster

Page 25: An Architect's guide to real time big data systems

Apache Storms

Application Deployment Units

Instance

Management Host

Application Host 1

Processing

Element 1

Processing

Element 2

Cluster

Management Node (Nimbus)

Node 1

Worker 1 Worker 2

Executor

IBM Infosphere Streams

Executor

Executor

ZooKeeper Node

Page 26: An Architect's guide to real time big data systems

High Availability & Adaptability

Optimizing scheduler assigns jobs to nodes, and continually manages resource allocation

Apache StormsIBM Infosphere Streams

Page 27: An Architect's guide to real time big data systems

High Availability & Adaptability

Apache StormsIBM Infosphere Streams

Dynamically add Nodes and Jobs

Page 28: An Architect's guide to real time big data systems

High Availability & Adaptability

Apache StormsIBM Infosphere Streams

Execution Units on Failed Nodes can be moved automatically with communications re-routed

Page 29: An Architect's guide to real time big data systems

Topic:

Organized byUNICOM Trainings & Seminars Pvt. Ltd.

[email protected]

DEMO

Page 30: An Architect's guide to real time big data systems

Topic:

Organized byUNICOM Trainings & Seminars Pvt. Ltd.

[email protected]

Speaker name: Raja SPEmail ID: [email protected]

Thank You