myheritage kakfa use cases - feb 2014 meetup

25
MyHeritage and Kafka Author: Ran Levy Feb 2014

Upload: ran-levy

Post on 16-Apr-2017

571 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: MyHeritage Kakfa use cases - Feb 2014 Meetup

MyHeritage and Kafka

Author: Ran LevyFeb 2014

Page 2: MyHeritage Kakfa use cases - Feb 2014 Meetup

• MyHeritage use cases

• Possible solutions

• Kafka overview

• Actual implementation @MyHeritage

• Summary

Agenda

Page 3: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Two major use case:

– Indexing to SuperSearch and Record Matching.

– Stats reporting to BI.

Use cases

Page 4: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Indexing to SuperSearch and Record Matching

Use case 1

Page 5: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Custom and non-scalable solution that involved changes processing and updating SuperSearch (SOLR over Lucene).

• Required solution should support:– Continuous mode.– High throughput.– Scaling up. – Repeating the process from some point.– Guaranteed order of processed items.– Reliable.– Multiple consumers.

Use case 1 – con’t

Page 6: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Statistics reporting to BI system

Use case 2

Page 7: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Required solution should support:

• High scale (~500GB of data / day).• Scale up – few hundred millions per day.• Repeating the process from some point.• Multiple consumers.

Use case 2 – con’t

Page 8: MyHeritage Kakfa use cases - Feb 2014 Meetup

MyHeritage use cases

• Possible solutions

• Kafka overview

• Actual implementation @MyHeritage

• Summary

Agenda

Page 9: MyHeritage Kakfa use cases - Feb 2014 Meetup

• So what we have considered ….– DB

• Queues

Possible Solutions

Page 10: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Key point about queues

– Messages are deleted after consumed.– Messages are duplicated to support multiple readers.

Possible Solutions

Page 11: MyHeritage Kakfa use cases - Feb 2014 Meetup

MyHeritage use cases

Possible solutions

• Kafka overview

• Actual implementation @MyHeritage

• Summary

Agenda

Page 12: MyHeritage Kakfa use cases - Feb 2014 Meetup

• A high throughput distributed messaging system

– Fast– Scalable– Durable– Distributed by design– Simplicity (over functionality)

Kafka Overview

Page 13: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Fast (very fast) – both for producer and consumer

Kafka Overview

Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

Page 14: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Main entities– Producer – push data.– Consumer – pull data.– Brokers – load balance producers by partition.– Topic – feeds of messages belongs to the same logical category.

Kafka Overview

Page 15: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Communication between the clients and the servers is done with a simple, high-performance TCP protocol.

• For each topic, the Kafka cluster maintains a partitioned log which is a commit-log (appends only).

Kafka Overview – some internals

Page 16: MyHeritage Kakfa use cases - Feb 2014 Meetup

• Messages stay on disk when consumed, deleted after defined TTL.

• The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions.

• Each partition is replicated across a configurable number of servers for fault tolerance.

Kafka Overview – some internals

Page 17: MyHeritage Kakfa use cases - Feb 2014 Meetup

MyHeritage use cases

Possible solutions

Kafka overview

• Actual implementation @MyHeritage

• Summary

Agenda

Page 18: MyHeritage Kakfa use cases - Feb 2014 Meetup

High Level Overview

Broker 1

Family Tree changes Topic

part 1

part 2

part 32

Indexing

Consumers

RecordMatching

Logstash reader

Web

Producers

Daemons

Face recog.

Activity Topic

part 1

part 2

part 32

DRBD replica

Of Broker2

Broker 2

Family Tree changes Topic

part 1

part 2

part 32

Activity Topic

part 1

part 2

part 32

DRBD replica

Of Broker1

… ………

Page 19: MyHeritage Kakfa use cases - Feb 2014 Meetup

Kafka @Myheritage - producers

App ModuleApp

ModuleApp Module

Events System

Dispatch event

Subscriber

Subscriber

EventLoggerSubscriber

Notify

Notify

Notify

ILogWrite

ActivityManager

Dispatch

event

Page 20: MyHeritage Kakfa use cases - Feb 2014 Meetup

Kafka @Myheritage - producers

KafkaWriter

Topic

BrokersConfig

ISelector

ISerializer

ILogger

IStats

Page 21: MyHeritage Kakfa use cases - Feb 2014 Meetup

Kafka @Myheritage - producers

App ModuleApp

ModuleApp Module

Events System

Dispatch event

Subscriber

Subscriber

EventLoggerSubscriber

Notify

Notify

Notify

KafkaWriter

BrokerBroker

Attempt 1st broker(if failed) Attempt 2nd broker

Page 22: MyHeritage Kakfa use cases - Feb 2014 Meetup

Kafka @Myheritage – Consumers (Indexing)

EventProcessor

1 Per consumer type, reader per

partition

Broker 2

Broker 1

EventProcessorEventProcessor

Fetch event from part<x>, offset <z>

Fetch event from part<x>, offset <z>

IndexingQueue

IndexingWorkersIndexingWorkers

IndexingWorkers

Fetch work

SOLRUpdate item

KafkaWatermark

Get/update watermark

Add event to queue

Page 23: MyHeritage Kakfa use cases - Feb 2014 Meetup

MyHeritage use cases

Possible solutions

Kafka overview

Actual implementation @MyHeritage

• Summary

Agenda

Page 24: MyHeritage Kakfa use cases - Feb 2014 Meetup

Kafka is very fast and scalable system, that is extensively used at MyHeritage, and you would want to consider it for high scale systems you

are using.

Summary

Page 25: MyHeritage Kakfa use cases - Feb 2014 Meetup

Thank you and questions

[email protected]