kafka, the "dialtone for data": building a self-service, scalable, streaming analytics...

39
© Copyright 2016 HomeAway, Inc. Kafka: The “Dial Tone” for Data

Upload: confluent

Post on 23-Jan-2017

692 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

© Copyright 2016 HomeAway, Inc.

Kafka: The “Dial Tone” for Data

Page 2: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

HomeAwayThe world leader for vacation

rentals

> 1 million listings(and growing!)

Page 3: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Agenda

© Copyright 2016 HomeAway, Inc.

• Overview• The Problem• The Experiment• Results: Use Cases• Lessons Learned• Next Steps

Page 4: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

© Copyright 2016 HomeAway, Inc.

Overview

Page 5: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Difference between Dinosaurs and Unicorns

© Copyright 2016 HomeAway, Inc.

Page 6: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

In the old days: “Dial Tone” looked like this

© Copyright 2016 HomeAway, Inc.

ATDT

Page 7: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Today: Kafka is the modern “Dial Tone” for Data

© Copyright 2016 HomeAway, Inc.

Producer

Consumer

Page 8: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

The Problem

© Copyright 2016 HomeAway, Inc.

Page 9: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

The Problem

© Copyright 2016 HomeAway, Inc.

Page 10: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Our original problem/motivation

© Copyright 2016 HomeAway, Inc.

search head

indexer

indexerapp server forwarder

app server forwarder

1 TB/day ingress and growing!40,000 calls/sec

Page 11: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Also… Historical Analytic Pipeline was slow/expensive

© Copyright 2016 HomeAway, Inc.

app server

OLTP OLAP

analyticsETL

Page 12: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Fill the Lake! Alternatives

?

Problem: Fill Hadoop!

Problem Data Lake

© Copyright 2016 HomeAway, Inc.

Page 13: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

What we wanted… the Big Idea

© Copyright 2016 HomeAway, Inc.

If you can log it… … you can analyze it!

Page 14: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

How to build self-service?

© Copyright 2016 HomeAway, Inc.

Page 15: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Hypothesis: Use Kafka!

© Copyright 2016 HomeAway, Inc.

2 ms medianlatency

http://bit.ly/jay_on_logs the log

2 Million Events / Sec! (3 cheap machines)

http://goo.gl/pv5GoL “Benchmarking Apache

Kafka”

Page 16: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

© Copyright 2016 HomeAway, Inc.

The Experiment

Page 17: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

HACommonsLogging• KafkaAppender

Schema-on-read• KafkaAvroLogger

Schema-on-write

Experiment: Schema-on-Read, Schema-on-Write

Data Lake

© Copyright 2016 HomeAway, Inc.

SchemaRegistry

Page 18: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Architecture: Kafka + Camus = BigData Ingress

© Copyright 2016 HomeAway, Inc.

Camus

Page 19: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

© Copyright 2016 HomeAway, Inc.

The Results

Page 20: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Use Cases: ITOA / SLA Reporting

© Copyright 2016 HomeAway, Inc.

Page 21: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Use Cases: ITOA / SLA Reporting

© Copyright 2016 HomeAway, Inc.

Page 22: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Use Cases: Fraud

© Copyright 2016 HomeAway, Inc.

Page 23: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Use Cases: Search + ClickStream

© Copyright 2016 HomeAway, Inc.

User Behavior

Search RequestsA/B Test

Readouts

Proctor

EDAP

Page 24: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Use Cases: Search + ClickStream

© Copyright 2016 HomeAway, Inc.

Page 25: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Use Cases: Traveler Segmentation

© Copyright 2016 HomeAway, Inc.

EDAP

Data Mode

l

Page 26: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Lessons Learned

© Copyright 2016 HomeAway, Inc.

Page 27: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Lesson #1: The Schema [registry] is Everything!

Data Lake

© Copyright 2016 HomeAway, Inc.

SchemaRegistry

• Decouples producers from consumers

• Enforces backwards compatibility

• Enables self-service / democratization

• SOT for schemas in the pipe

Page 28: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Lesson #2: A Kafka/SR governance module is helpful

Data Lake

© Copyright 2016 HomeAway, Inc.

• TURN OFF Auto Topic Creation!

• Need a place for developersto request topics• Retention Policy• Expected Load• Compaction• Partition Size / Partition Key• Owner• LTS Date

Page 29: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Lesson #3: Make it easy to do stream processing

© Copyright 2016 HomeAway, Inc.

SchemaRegistry

• samza-archetype• samza-job-deployer

• Will evaluate k-streams!!!!

http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple

Page 30: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Next Steps

© Copyright 2016 HomeAway, Inc.

Page 31: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Consistency : 3 types of Data

© Copyright 2016 HomeAway, Inc.

Event

Document

Transactional

Page 32: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Kafka Producer Spooling

© Copyright 2016 HomeAway, Inc.

Page 33: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Conclusion

© Copyright 2016 HomeAway, Inc.

Page 34: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Yesterday

© Copyright 2016 HomeAway, Inc.

Systems of Record

Page 35: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Today

© Copyright 2016 HomeAway, Inc.

Systems of Engagement

Page 36: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Tomorrow

© Copyright 2016 HomeAway, Inc.

Systems of Intelligence

Page 37: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Don’t be a dinosaur…

© Copyright 2016 HomeAway, Inc.

ATDT

Page 38: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

Thank you

© Copyright 2016 HomeAway, Inc.

Page 39: Kafka, the "DialTone for Data": Building a self-service, scalable, streaming analytics system @ HomeAway, Rene Parra

End of Presentation

© Copyright 2016 HomeAway, Inc.