detecting events on the web in real time with java, kafka and zookeeper - james stanier

95
Dr. James Stanier | Head of Analytics | Brandwatch.com | [email protected]

Upload: jaxlondon2014

Post on 08-Jul-2015

450 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

JAX London Presentation 2014

TRANSCRIPT

Page 1: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Dr. James Stanier | Head of Analytics | Brandwatch.com | [email protected]

Page 2: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Coming Up/

© 2014 Brandwatch | www.brandwatch.com 2

•  Me, Brandwatch and new problems

•  Apache Kafka

•  Processing data in Java

•  Distributing work with Zookeeper

•  Managing state

Page 3: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 3

Who?

Page 4: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 4

Dr. James Stanier Head of Analytics, Brandwatch @jstanier | [email protected]

Page 5: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 5

Brandwatch

Page 6: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2013 Brandwatch | www.brandwatch.com 6

Where we are/

•  Brighton

•  New York

•  San Francisco

•  Berlin

•  Stuttgart

Page 7: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 7

Page 8: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Crawl Store and Index

What Brandwatch does/

© 2014 Brandwatch | www.brandwatch.com 8

Analyse

3

Present

4

•  Crawl 70M+ sites including key social networks

•  27 languages

•  Powerful search operators

•  20Bn + indexed URLs

•  Years of historical data

•  Automated topic & sentiment analysis in all 27 languages

•  Automate common tasks including alerts

•  Advanced analytics modules

•  Automatic categorisation with rules

•  Custom dashboards

•  Reporting & alerts

Page 9: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 9

Brandwatch Analytics

Page 10: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

10

Data/ Presentation

Page 11: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 11

Data/ Aggregation

Page 12: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 12

Data/ Classification

Page 13: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 13

Data/ Not just top level metrics

Page 14: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 14

Development/ What do we use?

Page 15: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 15

Data/ The numbers

•  50+ Java Web Crawlers

•  10+ Historical crawlers for new queries

•  Twitter via GNIP (now Twitter)

•  70M+ query matches per day

Page 16: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 16

The speed of social

Page 17: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 17

Page 18: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 18

Page 19: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 19

Page 20: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 20

A new challenge

Page 21: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

21

The challenge/ The signal from the noise

Page 22: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 22

•  100K+ user queries

•  70M+ mentions per day

•  Polling the database for mentions would take

8hrs for one pass

The challenge/ at scale

Page 23: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 23

Crawler 1

Crawler 2

Crawler n-1

Crawler N

Kafka Cluster

Signals Processing

cluster

Signals handler

JVM

DB

Mentions

Signals

Signals

Mentions

The Problem/ How we handled it…

Page 24: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 24

Kafka

Page 25: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Step 1/ Kafka

© 2014 Brandwatch | www.brandwatch.com 25

Crawler 1

Crawler 2

Crawler n-1

Crawler N

Kafka Cluster Mentions

Page 26: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 26

Kafka/ What is it?

•  Apache Kafka is a publish-subscribe messaging

system rethought as a distributed commit log

•  Apache top level project November 2013

•  Started at LinkedIn

Page 27: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 27

Kafka/ is…

•  Fast: hundreds of MBs read/write per second

from thousands of clients

•  Scalable: clustered, partitioned over many

machines, expanded without downtime

•  Durable: messages persisted to disk and

replicated in cluster

Page 28: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 28

Kafka/ Terminology

•  Kafka maintains feeds of messages

called topics

•  Programs that publish messages are

called producers

•  Programs that subscribe to messages

are called consumers

•  Kafka is a cluster of servers called brokers

Page 29: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Kafka/ How it’s used...

© 2014 Brandwatch | www.brandwatch.com 29

Producer Producer Producer

Consumer Consumer Consumer

Kafka Cluster

Page 30: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Kafka/ Written to disk?

© 2014 Brandwatch | www.brandwatch.com 30

http://q.acm.org/detail.cfm?id=1563874

Page 31: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Kafka/ Bending, not breaking

© 2014 Brandwatch | www.brandwatch.com 31

http://engineering.gnip.com/tag/kafka/

Page 32: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Kafka/ The anatomy of a topic

© 2014 Brandwatch | www.brandwatch.com 32

0 2 1 5 3 4 6

0 2 1 3 4

0 2 1 5 3 4

Partition 0

Partition 1

Partition 2

Old New

Writes

Page 33: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 33

Kafka/ Warning: ordering

Kafka guarantees a total ordering per partition,

not per whole topic

Page 34: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 34

Kafka/ Try it out! > tar -xzf kafka_2.9.2-0.8.1.tgz!

> cd kafka_2.9.2-0.8.1!

> bin/zookeeper-server-start.sh config/zookeeper.properties!

> bin/kafka-server-start.sh config/server.properties!

!

Page 35: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 35

Kafka/ Try it out! > bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test!

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test!

Hello JAX!!

> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning!

Hello JAX!!

Page 36: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 36

Kafka/ With Java <dependency>!

!<groupId>org.apache.kafka</groupId>!

!<artifactId>kafka_2.10</artifactId>!

!<version>0.8.1</version>!

</dependency>!

Page 37: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 37

Kafka/ With Java Properties props = new Properties();!

props.put("metadata.broker.list", "broker1:9092,broker2:9092");!

props.put("serializer.class", "kafka.serializer.StringEncoder");!

props.put("partitioner.class", "example.producer.SimplePartitioner");!

props.put("request.required.acks", "1");!

ProducerConfig config = new ProducerConfig(props);!

Producer<String, String> producer = new Producer<String, String>(config);!

Page 38: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 38

Kafka/ Partitioning

public class SimplePartitioner implements Partitioner<String> {!

   public SimplePartitioner (VerifiableProperties props) {!

!

   public int partition(String key, int numberOfPartitions) {!

       return md5hash(key) % numberOfPartitions;!

   }!

}!

Page 39: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 39

Kafka/ Sending from the crawlers String message = toJson(...);!

KeyedMessage<String, String> message = new!

!KeyedMessage<String, String>("query.mentions", !queryId, message);!

producer.send(message);!

Page 40: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Step 1/ Done

© 2014 Brandwatch | www.brandwatch.com 40

Crawler 1

Crawler 2

Crawler n-1

Crawler N

Kafka Cluster Mentions

Page 41: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 41

Processing

Page 42: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

42

Processing/ What’s happening now?

Page 43: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Step 2.1/ One processing JVM

© 2014 Brandwatch | www.brandwatch.com 43

Crawler 1

Crawler 2

Crawler n-1

Crawler N

Kafka Cluster Mentions

Signals processor

Page 44: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ A wild tweet appears!

© 2014 Brandwatch | www.brandwatch.com 44

Mention date: 14/10/2014 6:10PM pageType: twitter author: @javadude hashtags: [#jaxlondon, #amazingtalk, #greatshoes] mentionedTweeters: [@jstanier] text: “@jstanier is at #jaxlondon tonight #amazingtalk #greatshoes”

Page 45: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ Storing hashtags

© 2014 Brandwatch | www.brandwatch.com 45

Map<Date, Multiset<String>>!

!

Initialise with the last 24 hours

Page 46: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ Storing hashtags

© 2014 Brandwatch | www.brandwatch.com 46

Map<Date, Multiset<String>>!

!

Mention date: 14/10/2014 6:10PM hashtags: [#jaxlondon, #amazingtalk, #greatshoes]

Page 47: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ Storing hashtags

© 2014 Brandwatch | www.brandwatch.com 47

Map<Date, Multiset<String>>!

!

Mention date: 14/10/2014 6:10PM hashtags: [#jaxlondon, #amazingtalk, #greatshoes] add(“#jaxlondon”)!add(“#amazingtalk”)!add(“#greatshoes”)!

Page 48: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 48

Processing/ Cycling the buckets   @Scheduled(cron = "0 0 * * * *")!

  public void cycleBuckets() {!

       Date oldest = buckets.lastKey();!

       removeBucket(oldest);!

       DateTime newest = new!

! ! !DateTime(buckets.firstKey());!

       addBucket(newest.plusHours(1).toDate());!

   }!

Page 49: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ Detecting spikes

© 2014 Brandwatch | www.brandwatch.com 49

•  At regular intervals •  For each hashtag

•  Convert to a timeseries [5, …. 1002, 5499] •  Use our super secret detection algorithm •  Give a score to it

•  If score > threshold, it’s interesting •  Send it on a new Kafka topic

Page 50: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ What we just did

© 2014 Brandwatch | www.brandwatch.com 50

#hashtag data

model

Page 51: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ But we also track…

© 2014 Brandwatch | www.brandwatch.com 51

author data

model

sentiment data

model

page type data

model

link share data

model

#hashtag data

model

country data

model

volume data

model

Page 52: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ …for one query

© 2014 Brandwatch | www.brandwatch.com 52

“JAX London” query

author data

model

sentiment data

model

page type data

model

link share data

model

#hashtag data

model

country data

model

volume data

model

Page 53: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Processing/ 100K+ queries and rising

© 2014 Brandwatch | www.brandwatch.com 53

Page 54: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 54

Processing/ We need more JVMs

But how do we share the workload?

Page 55: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 55

Distribution of work

Page 56: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com

Crawler 1

Crawler 2

Crawler n-1

Crawler N

Kafka Cluster

Signals Processing

cluster

Mentions Signals

Mentions

Step 2.2/ A cluster of processing JVMs

Page 57: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com

Signals Processing

cluster

Distribution/ An atomic unit of work

?

Page 58: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 58

Distribution/ Leader election

A way of deciding who is the leader for a task in a group of distributed nodes

Page 59: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 59

Distribution/ Zookeeper

A way of coordinating and managing distributed applications

Page 60: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 60

Zookeeper/ It’s like a file system

/brandwatch

/feature_2 /feature_1

Page 61: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 61

Zookeeper/ At the command line

Page 62: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 62

Distribution/ In Java

http://curator.apache.org/curator-framework

Page 63: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 63

Distribution/ Recipes

Page 64: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 64

Distribution/ Instantiating Curator !

CuratorFrameworkFactory!

.builder()!

.connectString(zkQuorum)!

.namespace(namespace)!

.build()!

! ! ! !.start();!

Page 65: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 65

Distribution/ Offering jobs

/brandwatch

/signals

/queries

/1268589 /15846

Manager JVM

DB

Page 66: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 66

Distribution/ Adding nodes !

public void createZNode(String queryId) {!

try {!

client.create().forPath(ZK_NODE_PREFIX + queryId);!

} catch (NodeExistsException e) {!

log.debug("Node {} was already created.”, queryId);!

}!

}!

Page 67: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 67

Distribution/ Deleting nodes !

public void removeZNode(String queryId) {!

try {!

client.delete().forPath(ZK_NODE_PREFIX + queryId);!

} catch (NoNodeException e) {!

log.debug("Node {} was already deleted.”, queryId);!

}!

}!

Page 68: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 68

Distribution/ Leader election 101

/brandwatch

/signals

/queries

/1268589 /15846

Processing JVM 1

Processing JVM 2

Processing JVM 3

Page 69: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 69

Distribution/ Leader election 101

/brandwatch

/signals

/queries

/1268589 /15846

1 2 3

Processing JVM 1

Processing JVM 2

Processing JVM 3

Page 70: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 70

Distribution/ The leader dies

/brandwatch

/signals

/queries

/1268589 /15846

2 3

Processing JVM

Processing JVM

Page 71: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 71

Distribution/ The dead rises again

/brandwatch

/signals

/queries

/1268589 /15846

2 3 4

Processing JVM

Processing JVM

Processing JVM

Page 72: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 72

Distribution/ Curator: LeaderLatch !

public LeaderLatch(CuratorFramework client, String latchPath)!

Parameters:!

!

client - the client!

latchPath - the path for this leadership group!

Page 73: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 73

Distribution/ LeaderLatch recipe public class WorkerManager implements PathChildrenCacheListener {!

!

private Map<Integer, LeaderLatch> leaderLatches = newHashMap();!

!

@Override!

public void childEvent(CuratorFramework client, !

PathChildrenCacheEvent event) {!

// Handle adds and removes here!!

}!

}!

Page 74: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 74

Distribution/ Curator: Starting up @PostConstruct!

public void initialise() throws Exception {!

List<ChildData> currentData = newArrayList(initialisePathChildrenCache());!

log.info("Pre creating workers for {} existing queries", currentData.size());!

for (ChildData childData : currentData) {!

int queryId = parseQueryIdFromPath(childData.getPath());!

startLeaderElection(queryId);!

}!

}!

Page 75: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 75

Distribution/ Curator: PathChildrenCache private List<ChildData> initialisePathChildrenCache() throws Exception {!

pathChildrenCache.start(StartMode.BUILD_INITIAL_CACHE);!

pathChildrenCache.getListenable().addListener(this);!

List<ChildData> currentData = pathChildrenCache.getCurrentData();!

return currentData;!

}!

Page 76: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 76

Distribution/ Curator: Adding a node @Override!

public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) { !

ChildData childData = event.getData();!

switch (event.getType()) {!

case CHILD_ADDED:!

queryId = parseQueryIdFromPath(childData.getPath());!

if (!haveLeaderLatchForQuery(queryId)) {!

startLeaderElection(queryId);!

}!

break;!

Page 77: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 77

Distribution/ Curator: Deleting a node // Continued...!

case CHILD_REMOVED:!

queryId = parseQueryIdFromPath(childData.getPath());!

removeLeaderLatchForQuery(queryId);!

break;!

default:!

break;!

}!

}!

Page 78: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 78

We are processing long running jobs

What about workers getting overloaded?

Distribution/ Almost there?

Page 79: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 79

1.  Take leadership

2.  Hit max queries?

a.  No – go to 3

b.  Yes – give up leadership, try again

3.  Start working

Distribution/ After leader election

Page 80: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 80

Actually, no…

Distribution/ Now we’re almost there?

Page 81: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 81

Distribution/ Infinite election

Processing JVM

Processing JVM Processing JVM

At capacity! At capacity!

At capacity!

Elected for 1328

Elected for 1328

Elected for 1328

Page 82: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 82

Distribution/ Solution

Processing JVM

Processing JVM Processing JVM

At capacity!

Elected for 1328

Elected for 1328

Refused 1328

Page 83: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 83

Distribution/ Solution

Processing JVM

Processing JVM Processing JVM

At capacity! At capacity!

At capacity!

Elected for 1328

Elected for 1328

Elected for 1328

Refused 1328

Refused 1328

Refused 1328

Page 84: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 84

State

Page 85: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 85

State/ CAP theorem

Availability

CA AP

CP

Page 86: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 86

If one worker dies, we want the other to pick up where it left off

Regular snapshotting to HBase

State/ Snapshotting of worker data

Page 87: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 87

Serialise and compress using Kryo

~ 0.5MB per query, but a lot are very small

State/ Serialisation and compression

Page 88: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 88

Crawler 1

Crawler 2

Crawler n-1

Crawler N

Kafka Cluster

Signals Processing

cluster

Signals handler

JVM

DB

Mentions

Signals

Signals

Mentions

Step 2.2/ Done!

Page 89: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 89

Monitoring

Page 90: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 90

Monitoring/ Statsd and Graphite

Page 91: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 91

Closing remarks

Page 92: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 92

Page 93: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 93

Now •  Smarter alerts (email, push, in-browser) •  Monitoring crises/events as they happen

Underway •  Automatic clustering of spikes into events •  Historical analysis of trends

Summary/ Using this architecture

Page 94: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

Say hello/

© 2014 Brandwatch | www.brandwatch.com 94

@brandwatch | @jstanier

www.brandwatch.com

UK: +44 (0)1273 358 635

[email protected]

Page 95: Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - James Stanier

© 2014 Brandwatch | www.brandwatch.com 95

Q&A