developing java streaming applications with apache storm

49
Page 1 Developing Java Streaming Applications with Apache Storm Lester Martin www.ajug.org - Nov 2017

Upload: lester-martin

Post on 16-Mar-2018

184 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Developing Java Streaming Applications with Apache Storm

Page 1

Developing Java Streaming Applicationswith Apache Storm

Lester Martin www.ajug.org - Nov 2017

Page 2: Developing Java Streaming Applications with Apache Storm

Page 2

Connection before ContentLester Martin – Hadoop/Spark/Storm Trainer & Consultant

[email protected]

http://lester.website (links to blog, twitter, github, LI, FB, etc)

Page 3: Developing Java Streaming Applications with Apache Storm

Page 3

Agenda – Needs Updating!!!!• What is Storm?• Conceptual Model• Compile Time• DEMO: Develop Word Count Topology

• Runtime• DEMO: Submit Word Count Topology• Additional Features• DEMO: Kafka > Storm > HBase Topology in Local Cluster

Page 4: Developing Java Streaming Applications with Apache Storm

Page 4

What is Storm?

Page 5: Developing Java Streaming Applications with Apache Storm

Page 5

Storm is …

à Streaming– Key enabler of the Lambda Architecture

à Fast– Clocked at 1M+ messages per second per node

à Scalable– Thousands of workers per cluster

à Fault Tolerant– Failure is expected, and embraced

à Reliable– Guaranteed message delivery– Exactly-once semantics

Page 6: Developing Java Streaming Applications with Apache Storm

Page 6

Storm in the Lambda Architecture

persists data

Hadoop

batch processing

batch feedsUpdate event models

Pattern templates, key-performance indicators, and

alerts

Dashboards and Applications

Stormreal-time data feeds

Page 7: Developing Java Streaming Applications with Apache Storm

Page 7

Conceptual Model

Page 8: Developing Java Streaming Applications with Apache Storm

Page 8

TUPLE

{…}

Page 9: Developing Java Streaming Applications with Apache Storm

Page 9

Tuple

à Unit of work to be processesà Immutable ordered set of serializable valuesà Fields must have assigned name

{…}

Page 10: Developing Java Streaming Applications with Apache Storm

Page 10

Stream

à Core abstraction of Stormà Unbounded sequence of Tuples

{…} {…} {…} {…} {…} {…} {…}

Page 11: Developing Java Streaming Applications with Apache Storm

Page 11

SPOUT

Page 12: Developing Java Streaming Applications with Apache Storm

Page 12

Spout

à Source of Streamsà Wrap an event source and emit Tuples

Page 13: Developing Java Streaming Applications with Apache Storm

Page 13

Message QueuesMessage queues are often the source of the data processed by StormStorm Spouts integrate with many types of message queues

real-time data source

operating systems,

services and applications,

sensors

Kestrel, RabbitMQ,

AMQP, Kafka, JMS, others…

message queue

log entries, events, errors,

status messages, etc.

Storm

data from queue is read by Storm

Page 14: Developing Java Streaming Applications with Apache Storm

Page 14

BOLT

Page 15: Developing Java Streaming Applications with Apache Storm

Page 15

Bolt

à Core unit of computationà Receive Tuples and do stuffà Optionally, emit additional Tuples

Page 16: Developing Java Streaming Applications with Apache Storm

Page 16

Bolt

à Write to a data store

Page 17: Developing Java Streaming Applications with Apache Storm

Page 17

Bolt

à Read from a data store

Page 18: Developing Java Streaming Applications with Apache Storm

Page 18

Bolt

à Perform arbitrary computation

Page 19: Developing Java Streaming Applications with Apache Storm

Page 19

Bolt

à (Optionally) Emit additional Stream(s)

Page 20: Developing Java Streaming Applications with Apache Storm

Page 20

TOPOLOGY

Page 21: Developing Java Streaming Applications with Apache Storm

Page 21

Topology

à DAG of Spouts and Boltsà Data Flow Representationà Streaming Computation

Page 22: Developing Java Streaming Applications with Apache Storm

Page 22

Topology

à Storm executes Spouts and Bolts as Tasks that run in parallel on multiple machines

Page 23: Developing Java Streaming Applications with Apache Storm

Page 23

Parallel Execution of Topology Components

a logical topology

spout A

bolt A bolt B

bolt C

a physical implementation

machine A

machine B

machine E

machine C

machine D

machine F

machine G

spout A two tasks

bolt A two tasks

bolt B two tasks

bolt C one task

Page 24: Developing Java Streaming Applications with Apache Storm

Page 24

Stream GroupingsStream Groupings determine how Storm routes Tuples between Tasks

Grouping Type Routing BehaviorShuffle Randomized round-robin (evenly distribute

load to downstream Bolts)Fields Ensures all Tuples with the same Field

value(s) are always routed to the same TaskAll Replicates Stream across all the Bolt’s

Tasks (use with care)Other options Including custom RYO grouping logic

Page 25: Developing Java Streaming Applications with Apache Storm

Page 25

Compile Time

@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(”sentence"));}

Page 26: Developing Java Streaming Applications with Apache Storm

Page 26

Example Spout Code (1 of 2)

public class RandomSentenceSpout extends BaseRichSpout {SpoutOutputCollector _collector;Random _rand;

@Overridepublic void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

_collector = collector;_rand = new Random();

}@Overridepublic void nextTuple() {

Utils.sleep(100);String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps

the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" };

String sentence = sentences[_rand.nextInt(sentences.length)];_collector.emit(new Values(sentence));

}

Continued next page…

Storm uses open to open the spout and provide it with its configuration, a context object providing information about components in the topology, and an output collector used to emit tuples.

Storm uses nextTuple to request the spout emit the next tuple.

The spout uses emit to send a tuple to one or more bolts.

Name of the spout class. Storm spout class used as a “template”.

Page 27: Developing Java Streaming Applications with Apache Storm

Page 27

Example Spout Code (2 of 2)

@Overridepublic void ack(Object id) {}@Overridepublic void fail(Object id) {}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {

declarer.declare(new Fields(”sentence"));}

}

Storm calls the spout’s ack method to signal that a tuple has been fully processed.

Storm calls the spout’s fail method to signal that a tuple has not been fully processed.

The declareOutputFieldsmethod names the fields in a tuple.

Continued…

Page 28: Developing Java Streaming Applications with Apache Storm

Page 28

Example Bolt Code

public static class ExclamationBolt extends BaseRichBolt {OutputCollector _collector;

public void prepare(Map conf, TopologyContext context, OutputCollector collector) {_collector = collector;

}

public void execute(Tuple tuple) {_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));_collector.ack(tuple);

}

public void cleanup(); {}

public void declareOutputFields(OutputFieldsDeclarer declarer) {declarer.declare(new Fields("word"));

} }

The prepare method provides the bolt with its configuration and an OutputCollectorused to emit tuples.

The execute method receives a tuple from a stream and emits a new tuple. It also provides an ackmethod that can be used after successful delivery.

The cleanup method releases system resources when bolt is shut down.

Names the fields in the output tuples. More detail later.

Name of the bolt class. Bolt class used as a “template.”

Page 29: Developing Java Streaming Applications with Apache Storm

Page 29

Example Topology Code

public static main(String[] args) throws exception {

TopologyBuilder builder = new TopologyBuilder();builder.setSpout(“words”, new TestWordSpout());builder.setBolt(“exclaim1”, new NewExclamationBolt()).shuffleGrouping(“words”);builder.setBolt(“exclaim2”, new NewExclamationBolt()).shuffleGrouping(“exclaim1”);

Config conf = new Config();

StormSubmitter.submitTopology(”add-exclamation", conf, builder.createTopology());}

This code…

words exclaim1 exclaim2shuffleGrouping shuffleGrouping

…builds this Topology.

runs code in TestWordSpout()

runs code in NewExclamationBolt()

runs code in NewExclamationBolt()

Page 30: Developing Java Streaming Applications with Apache Storm

Page 30

DEMODevelop Word Count Topology

Page 31: Developing Java Streaming Applications with Apache Storm

Page 31

Runtime

Nimbus

Supervisor

Supervisor

Supervisor

Supervisor

Page 32: Developing Java Streaming Applications with Apache Storm

Page 32

Physical View

Page 33: Developing Java Streaming Applications with Apache Storm

Page 33

Topology Submitter uploads topology:• topology.jar• topology.ser• conf.ser

Topology Deployment

Page 34: Developing Java Streaming Applications with Apache Storm

Page 34

Topology Deployment

Nimbus calculates assignments and sends to Zookeeper

Page 35: Developing Java Streaming Applications with Apache Storm

Page 35

Topology Deployment

Supervisor nodes receive assignment information via Zookeeper watches

Page 36: Developing Java Streaming Applications with Apache Storm

Page 36

Topology Deployment

Supervisor nodes download topology from Nimbus:• topology.jar• topology.ser• conf.ser

Page 37: Developing Java Streaming Applications with Apache Storm

Page 37

Topology Deployment

Supervisors spawn workers (JVM processes)

Page 38: Developing Java Streaming Applications with Apache Storm

Page 38

DEMOSubmit Topology to Storm Topology

Page 39: Developing Java Streaming Applications with Apache Storm

Page 39

Additional Features

FAIL

Page 40: Developing Java Streaming Applications with Apache Storm

Page 40

Local Versus Distributed Storm ClustersThe topology program code submitted to Storm using storm jar is different when submitting to local mode versus a distributed cluster. The submitTopology method is used in both cases.• The difference is the class that contains the submitTopology method.

Config conf = new Config();LocalCluster cluster = new LocalCluster();LocalCluster.submitTopology("mytopology", conf, topology);

Config conf = new Config(); StormSubmitter.submitTopology("mytopology", conf, topology);

Instantiate a local cluster object.

Submit a topology to a local cluster.

Submit a topology to a distributed cluster.Same method

name, different classes

Same method name, different classes.

Page 41: Developing Java Streaming Applications with Apache Storm

Page 41

Reliable Processing

Bolts may emit Tuples Anchored to one received.Tuple “B” is a descendant of Tuple “A”

Page 42: Developing Java Streaming Applications with Apache Storm

Page 42

Reliable Processing

Multiple Anchorings form a Tuple tree(bolts not shown)

Page 43: Developing Java Streaming Applications with Apache Storm

Page 43

Reliable Processing

Bolts can Acknowledge that a tuple has been processed successfully.

ACK

Page 44: Developing Java Streaming Applications with Apache Storm

Page 44

Reliable Processing

Bolts can also Fail a tuple to trigger a spout to replay the original.

FAIL

Page 45: Developing Java Streaming Applications with Apache Storm

Page 45

Reliable Processing

Any failure in the Tuple tree will trigger a replay of the original tuple

Page 46: Developing Java Streaming Applications with Apache Storm

Page 46

More Stuff

à Topology description/deployment options– Flux– Storm SQL

à Polyglot developmentà Micro-batching with Tridentà Fault tolerance & deployment isolationà Integrations

– Messaging; Kafka, Redis, Kestrel, Kinesis, MQTT, JMS– Databases; HBase, Hive, Druid, Cassandra, MongoDB, JDBC– Search Engines; Solr, Elasticsearch– HDFS– And more!

Page 47: Developing Java Streaming Applications with Apache Storm

Page 47

DEMOKafka > Storm > HBase Topology in a Local Cluster

Page 48: Developing Java Streaming Applications with Apache Storm

Page 48

Kafka > Storm > HBase ExampleRequirements:• Land simulated server logs into Kafka• Configure a Kafka Bolt to consume the server log messages• Ignore all messages that are not either WARN or ERROR• Persist WARN and ERROR messages into HBase

– Keep 10 most recent messages for each server

– Maintain a running total of these concerning messages

• Publish these messages back to Kafka

Kafka

Kafka

HBase

HBaseParse FilterKafka

Kafka

Page 49: Developing Java Streaming Applications with Apache Storm

Page 49

Questions?Lester Martin – Hadoop/Spark/Storm Trainer & Consultant

[email protected]

http://lester.website (links to blog, twitter, github, LI, FB, etc)

THANKS FOR YOUR TIME!!