bwb meetup: storm - distributed realtime computation system

69
Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin

Upload: andrii-gakhov

Post on 06-May-2015

817 views

Category:

Technology


5 download

DESCRIPTION

torm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

TRANSCRIPT

Page 1: BWB Meetup: Storm - distributed realtime computation system

Storm: overview

distributed and fault-tolerant realtimecomputation.

Backend Web Berlin

Page 2: BWB Meetup: Storm - distributed realtime computation system

Stormwww.storm-project.net

Storm is a free and open source distributed

realtime computation system.

September BWB Meetup

Page 3: BWB Meetup: Storm - distributed realtime computation system

Use cases

distributed RPC continuous computationsstream processing

Page 4: BWB Meetup: Storm - distributed realtime computation system

Overview

• free and open source

• integrates with any queuing and

database system

• distributed and scalable

• fault-tolerant

• supports multiple languages

Page 5: BWB Meetup: Storm - distributed realtime computation system

Scalable

Storm topologies are inherently parallel and run across a cluster of machines.

Different parts of the topology can be scaled individually by tweaking their

parallelism.

The "rebalance" command of the "storm" command line client can adjust the

parallelism of running topologies on the fly.

Page 6: BWB Meetup: Storm - distributed realtime computation system

Fault tolerant

When workers die, Storm will automatically restart them.

If a node dies, the worker will be restarted on another node.

The Storm daemons, Nimbus and the Supervisors, are designed to be stateless

and fail-fast.

Page 7: BWB Meetup: Storm - distributed realtime computation system

Guarantees data processing

Storm guarantees every tuple will be fully processed. One of Storm's core

mechanisms is the ability to track the lineage of a tuple as it makes its way

through the topology in an extremely efficient way.

Messages are only replayed when there are failures. Storm's basic abstractions

provide an at-least-once processing guarantee, the same guarantee you get

when using a queueing system.

Page 8: BWB Meetup: Storm - distributed realtime computation system

Use with many languages

Storm was designed from the ground up to be usable with any programming

language.

Similarly, spouts and bolts can be defined in any language. Non-JVM spouts

and bolts communicate to Storm over a JSON-based protocol over

stdin/stdout.

Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl,

and PHP.

Page 9: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Storm cluster

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Nimbus

Page 10: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

TopologyTopology is a graph of computation. A topology runs forever, or until you kill it.

StreamStream is an unbounded sequence of tuples.

SpoutSpout is a source of streams.

BoltBolt is the place where calculations are done. Bolts can do anything from runfunctions, filter tuples, do streaming aggregations, joins, talk to databases etc.

Page 11: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

Worker processA worker process executes a subset of a topology. A worker process belongs toa specific topology and may run one or more executors for one or morecomponents (spouts or bolts) of this topology.

Executor (thread)Executor is a thread that is spawned by a worker process. It may run 1+ tasksfor the same component. It always has 1 thread that it uses for all of its tasks.

TaskTask performs the actual data processing – each spout or bolt that you implement inyour code executes as many tasks across the cluster. The number of tasks for acomponent is always the same throughout the lifetime of a topology.

Page 12: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

Spout

Task1

Task2

BoltATask1

Task2

Task3

BoltB

Task1

Task2

BoltC

Task1

Task2

Task3

Task4

Task5

Task6

BoltDTask1

Task2

Task3

BoltE

Task1

Task2

BoltF

Task1

Page 13: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Topology Exampleclass DemoTopology {

TopologyBuilder builder = new TopologyBuilder();builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2)

.declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item");builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”);builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout")

.declareDefaultStream("uid", “fromB");builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA")

.declareDefaultStream("uid", “fromC");builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC")

.fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")).declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne");

builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”);builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”);StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology());

}

Page 14: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Spout Examplepublic class DemoSpout extends BaseRichSpout {

….@Overridepublic void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

_collector = collector;_queue = new MyFavoritQueue<string>();

}@Overridepublic void nextTuple() {

String nextItem = queue.poll();_collector.emit(new Values(nextItem));

}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(“item"));}

}

Page 15: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Bolt Examplepublic class BoltA extends BaseRichBolt {

private OutputCollector _collector;

@Overridepublic void execute(Tuple tuple) {

Object obj = tuple.getValue(0);String capitalizedItem = capitalize((String)obj);

_collector.emit(tuple, new Value(capitalizedItem));_collector.ack(tuple);

}

@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(“item"));}

}

Page 16: BWB Meetup: Storm - distributed realtime computation system

Storm UI

Page 17: BWB Meetup: Storm - distributed realtime computation system

Read More about Storm• Stormhttp://storm-project.net/• Example Storm Topologieshttps://github.com/nathanmarz/storm-starter• Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithmhttp://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/• Understanding the Internal Message Buffers of Stormhttp://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/• Understanding the Parallelism of a Storm Topologyhttp://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

Page 18: BWB Meetup: Storm - distributed realtime computation system

Storm in our company

ferret-go.com

Page 19: BWB Meetup: Storm - distributed realtime computation system

Ferret go GmbH

Trend & Media Analyticsferret-go.com

Page 20: BWB Meetup: Storm - distributed realtime computation system

Our data flow (simplified)

Twitter

Facebook

Google+

Blogs

Comments

Online media

Offline media

Reviews

Elas

tic S

earc

h

Elas

tic S

earc

h

Elas

tic S

earc

h

processing classification analyzing

Page 21: BWB Meetup: Storm - distributed realtime computation system

Problem overview

• we have a number of streams that spout items

• for every item we do different calculations

• at the end of calculations we save item into

storage(s) – ElasticSearch, PostgreSQL etc.

• if processing fails because of some environment

issues, we want to re-queue item easily

• some of our calculations can be done in parallel

Google+

TwitterFacebook

Page 22: BWB Meetup: Storm - distributed realtime computation system

Solution

• Redis-based queues for spouting

• 1-2 spouts per topology

• 1 bulk bolt for storage writing per worker

• Storm cluster with 2 nodes:32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04

• ~ 20 items per sec (could be increased)

• 3 slots per worker, 198 tasks, 68 executors

Page 23: BWB Meetup: Storm - distributed realtime computation system

Thank you!30.10.2013

September BWB Meetup

Andrii Gakhov

Page 24: BWB Meetup: Storm - distributed realtime computation system

Storm: overview

distributed and fault-tolerant realtimecomputation.

Backend Web Berlin

Page 25: BWB Meetup: Storm - distributed realtime computation system

Stormwww.storm-project.net

Storm is a free and open source distributed

realtime computation system.

September BWB Meetup

Page 26: BWB Meetup: Storm - distributed realtime computation system

Use cases

distributed RPC continuous computationsstream processing

Page 27: BWB Meetup: Storm - distributed realtime computation system

Overview

• free and open source

• integrates with any queuing and

database system

• distributed and scalable

• fault-tolerant

• supports multiple languages

Page 28: BWB Meetup: Storm - distributed realtime computation system

Scalable

Storm topologies are inherently parallel and run across a cluster of machines.

Different parts of the topology can be scaled individually by tweaking their

parallelism.

The "rebalance" command of the "storm" command line client can adjust the

parallelism of running topologies on the fly.

Page 29: BWB Meetup: Storm - distributed realtime computation system

Fault tolerant

When workers die, Storm will automatically restart them.

If a node dies, the worker will be restarted on another node.

The Storm daemons, Nimbus and the Supervisors, are designed to be stateless

and fail-fast.

Page 30: BWB Meetup: Storm - distributed realtime computation system

Guarantees data processing

Storm guarantees every tuple will be fully processed. One of Storm's core

mechanisms is the ability to track the lineage of a tuple as it makes its way

through the topology in an extremely efficient way.

Messages are only replayed when there are failures. Storm's basic abstractions

provide an at-least-once processing guarantee, the same guarantee you get

when using a queueing system.

Page 31: BWB Meetup: Storm - distributed realtime computation system

Use with many languages

Storm was designed from the ground up to be usable with any programming

language.

Similarly, spouts and bolts can be defined in any language. Non-JVM spouts

and bolts communicate to Storm over a JSON-based protocol over

stdin/stdout.

Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl,

and PHP.

Page 32: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Storm cluster

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Nimbus

Page 33: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

TopologyTopology is a graph of computation. A topology runs forever, or until you kill it.

StreamStream is an unbounded sequence of tuples.

SpoutSpout is a source of streams.

BoltBolt is the place where calculations are done. Bolts can do anything from runfunctions, filter tuples, do streaming aggregations, joins, talk to databases etc.

Page 34: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

Worker processA worker process executes a subset of a topology. A worker process belongs toa specific topology and may run one or more executors for one or morecomponents (spouts or bolts) of this topology.

Executor (thread)Executor is a thread that is spawned by a worker process. It may run 1+ tasksfor the same component. It always has 1 thread that it uses for all of its tasks.

TaskTask performs the actual data processing – each spout or bolt that you implement inyour code executes as many tasks across the cluster. The number of tasks for acomponent is always the same throughout the lifetime of a topology.

Page 35: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

Spout

Task1

Task2

BoltATask1

Task2

Task3

BoltB

Task1

Task2

BoltC

Task1

Task2

Task3

Task4

Task5

Task6

BoltDTask1

Task2

Task3

BoltE

Task1

Task2

BoltF

Task1

Page 36: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Topology Exampleclass DemoTopology {

TopologyBuilder builder = new TopologyBuilder();builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2)

.declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item");builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”);builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout")

.declareDefaultStream("uid", “fromB");builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA")

.declareDefaultStream("uid", “fromC");builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC")

.fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")).declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne");

builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”);builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”);StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology());

}

Page 37: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Spout Examplepublic class DemoSpout extends BaseRichSpout {

….@Overridepublic void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

_collector = collector;_queue = new MyFavoritQueue<string>();

}@Overridepublic void nextTuple() {

String nextItem = queue.poll();_collector.emit(new Values(nextItem));

}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(“item"));}

}

Page 38: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Bolt Examplepublic class BoltA extends BaseRichBolt {

private OutputCollector _collector;

@Overridepublic void execute(Tuple tuple) {

Object obj = tuple.getValue(0);String capitalizedItem = capitalize((String)obj);

_collector.emit(tuple, new Value(capitalizedItem));_collector.ack(tuple);

}

@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(“item"));}

}

Page 39: BWB Meetup: Storm - distributed realtime computation system

Storm UI

Page 40: BWB Meetup: Storm - distributed realtime computation system

Read More about Storm• Stormhttp://storm-project.net/• Example Storm Topologieshttps://github.com/nathanmarz/storm-starter• Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithmhttp://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/• Understanding the Internal Message Buffers of Stormhttp://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/• Understanding the Parallelism of a Storm Topologyhttp://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

Page 41: BWB Meetup: Storm - distributed realtime computation system

Storm in our company

ferret-go.com

Page 42: BWB Meetup: Storm - distributed realtime computation system

Ferret go GmbH

Trend & Media Analyticsferret-go.com

Page 43: BWB Meetup: Storm - distributed realtime computation system

Our data flow (simplified)

Twitter

Facebook

Google+

Blogs

Comments

Online media

Offline media

Reviews

Elas

tic S

earc

h

Elas

tic S

earc

h

Elas

tic S

earc

h

processing classification analyzing

Page 44: BWB Meetup: Storm - distributed realtime computation system

Problem overview

• we have a number of streams that spout items

• for every item we do different calculations

• at the end of calculations we save item into

storage(s) – ElasticSearch, PostgreSQL etc.

• if processing fails because of some environment

issues, we want to re-queue item easily

• some of our calculations can be done in parallel

Google+

TwitterFacebook

Page 45: BWB Meetup: Storm - distributed realtime computation system

Solution

• Redis-based queues for spouting

• 1-2 spouts per topology

• 1 bulk bolt for storage writing per worker

• Storm cluster with 2 nodes:32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04

• ~ 20 items per sec (could be increased)

• 3 slots per worker, 198 tasks, 68 executors

Page 46: BWB Meetup: Storm - distributed realtime computation system

Thank you!30.09.2013

September BWB Meetup

Andrii Gakhov

Page 47: BWB Meetup: Storm - distributed realtime computation system

Storm: overview

distributed and fault-tolerant realtimecomputation.

Backend Web Berlin

Page 48: BWB Meetup: Storm - distributed realtime computation system

Stormwww.storm-project.net

Storm is a free and open source distributed

realtime computation system.

September BWB Meetup

Page 49: BWB Meetup: Storm - distributed realtime computation system

Use cases

distributed RPC continuous computationsstream processing

Page 50: BWB Meetup: Storm - distributed realtime computation system

Overview

• free and open source

• integrates with any queuing and

database system

• distributed and scalable

• fault-tolerant

• supports multiple languages

Page 51: BWB Meetup: Storm - distributed realtime computation system

Scalable

Storm topologies are inherently parallel and run across a cluster of machines.

Different parts of the topology can be scaled individually by tweaking their

parallelism.

The "rebalance" command of the "storm" command line client can adjust the

parallelism of running topologies on the fly.

Page 52: BWB Meetup: Storm - distributed realtime computation system

Fault tolerant

When workers die, Storm will automatically restart them.

If a node dies, the worker will be restarted on another node.

The Storm daemons, Nimbus and the Supervisors, are designed to be stateless

and fail-fast.

Page 53: BWB Meetup: Storm - distributed realtime computation system

Guarantees data processing

Storm guarantees every tuple will be fully processed. One of Storm's core

mechanisms is the ability to track the lineage of a tuple as it makes its way

through the topology in an extremely efficient way.

Messages are only replayed when there are failures. Storm's basic abstractions

provide an at-least-once processing guarantee, the same guarantee you get

when using a queueing system.

Page 54: BWB Meetup: Storm - distributed realtime computation system

Use with many languages

Storm was designed from the ground up to be usable with any programming

language.

Similarly, spouts and bolts can be defined in any language. Non-JVM spouts

and bolts communicate to Storm over a JSON-based protocol over

stdin/stdout.

Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl,

and PHP.

Page 55: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Storm cluster

Zookeeper

Zookeeper

Zookeeper

Supervisor

Supervisor

Supervisor

Supervisor

Supervisor

Nimbus

Page 56: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

TopologyTopology is a graph of computation. A topology runs forever, or until you kill it.

StreamStream is an unbounded sequence of tuples.

SpoutSpout is a source of streams.

BoltBolt is the place where calculations are done. Bolts can do anything from runfunctions, filter tuples, do streaming aggregations, joins, talk to databases etc.

Page 57: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

Worker processA worker process executes a subset of a topology. A worker process belongs toa specific topology and may run one or more executors for one or morecomponents (spouts or bolts) of this topology.

Executor (thread)Executor is a thread that is spawned by a worker process. It may run 1+ tasksfor the same component. It always has 1 thread that it uses for all of its tasks.

TaskTask performs the actual data processing – each spout or bolt that you implement inyour code executes as many tasks across the cluster. The number of tasks for acomponent is always the same throughout the lifetime of a topology.

Page 58: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Basic concepts

Spout

Task1

Task2

BoltATask1

Task2

Task3

BoltB

Task1

Task2

BoltC

Task1

Task2

Task3

Task4

Task5

Task6

BoltDTask1

Task2

Task3

BoltE

Task1

Task2

BoltF

Task1

Page 59: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Topology Exampleclass DemoTopology {

TopologyBuilder builder = new TopologyBuilder();builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2)

.declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item");builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”);builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout")

.declareDefaultStream("uid", “fromB");builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA")

.declareDefaultStream("uid", “fromC");builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC")

.fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")).declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne");

builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”);builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”);StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology());

}

Page 60: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Spout Examplepublic class DemoSpout extends BaseRichSpout {

….@Overridepublic void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

_collector = collector;_queue = new MyFavoritQueue<string>();

}@Overridepublic void nextTuple() {

String nextItem = queue.poll();_collector.emit(new Values(nextItem));

}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(“item"));}

}

Page 61: BWB Meetup: Storm - distributed realtime computation system

How Storm works? Bolt Examplepublic class BoltA extends BaseRichBolt {

private OutputCollector _collector;

@Overridepublic void execute(Tuple tuple) {

Object obj = tuple.getValue(0);String capitalizedItem = capitalize((String)obj);

_collector.emit(tuple, new Value(capitalizedItem));_collector.ack(tuple);

}

@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(“item"));}

}

Page 62: BWB Meetup: Storm - distributed realtime computation system

Storm UI

Page 63: BWB Meetup: Storm - distributed realtime computation system

Read More about Storm• Stormhttp://storm-project.net/• Example Storm Topologieshttps://github.com/nathanmarz/storm-starter• Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithmhttp://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/• Understanding the Internal Message Buffers of Stormhttp://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/• Understanding the Parallelism of a Storm Topologyhttp://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

Page 64: BWB Meetup: Storm - distributed realtime computation system

Storm in our company

ferret-go.com

Page 65: BWB Meetup: Storm - distributed realtime computation system

Ferret go GmbH

Trend & Media Analyticsferret-go.com

Page 66: BWB Meetup: Storm - distributed realtime computation system

Our data flow (simplified)

Twitter

Facebook

Google+

Blogs

Comments

Online media

Offline media

Reviews

Elas

tic S

earc

h

Elas

tic S

earc

h

Elas

tic S

earc

h

processing classification analyzing

Page 67: BWB Meetup: Storm - distributed realtime computation system

Problem overview

• we have a number of streams that spout items

• for every item we do different calculations

• at the end of calculations we save item into

storage(s) – ElasticSearch, PostgreSQL etc.

• if processing fails because of some environment

issues, we want to re-queue item easily

• some of our calculations can be done in parallel

Google+

TwitterFacebook

Page 68: BWB Meetup: Storm - distributed realtime computation system

Solution

• Redis-based queues for spouting

• 1-2 spouts per topology

• 1 bulk bolt for storage writing per worker

• Storm cluster with 2 nodes:32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04

• ~ 20 items per sec (could be increased)

• 3 slots per worker, 198 tasks, 68 executors

Page 69: BWB Meetup: Storm - distributed realtime computation system

Thank you!30.09.2013

September BWB Meetup

Andrii Gakhov