kafka, the "dialtone for data": building a self-service, scalable, streaming analytics...

Post on 23-Jan-2017

692 Views

Category:

Engineering

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© Copyright 2016 HomeAway, Inc.

Kafka: The “Dial Tone” for Data

HomeAwayThe world leader for vacation

rentals

> 1 million listings(and growing!)

Agenda

© Copyright 2016 HomeAway, Inc.

• Overview• The Problem• The Experiment• Results: Use Cases• Lessons Learned• Next Steps

© Copyright 2016 HomeAway, Inc.

Overview

Difference between Dinosaurs and Unicorns

© Copyright 2016 HomeAway, Inc.

In the old days: “Dial Tone” looked like this

© Copyright 2016 HomeAway, Inc.

ATDT

Today: Kafka is the modern “Dial Tone” for Data

© Copyright 2016 HomeAway, Inc.

Producer

Consumer

The Problem

© Copyright 2016 HomeAway, Inc.

The Problem

© Copyright 2016 HomeAway, Inc.

Our original problem/motivation

© Copyright 2016 HomeAway, Inc.

search head

indexer

indexerapp server forwarder

app server forwarder

1 TB/day ingress and growing!40,000 calls/sec

Also… Historical Analytic Pipeline was slow/expensive

© Copyright 2016 HomeAway, Inc.

app server

OLTP OLAP

analyticsETL

Fill the Lake! Alternatives

?

Problem: Fill Hadoop!

Problem Data Lake

© Copyright 2016 HomeAway, Inc.

What we wanted… the Big Idea

© Copyright 2016 HomeAway, Inc.

If you can log it… … you can analyze it!

How to build self-service?

© Copyright 2016 HomeAway, Inc.

Hypothesis: Use Kafka!

© Copyright 2016 HomeAway, Inc.

2 ms medianlatency

http://bit.ly/jay_on_logs the log

2 Million Events / Sec! (3 cheap machines)

http://goo.gl/pv5GoL “Benchmarking Apache

Kafka”

© Copyright 2016 HomeAway, Inc.

The Experiment

HACommonsLogging• KafkaAppender

Schema-on-read• KafkaAvroLogger

Schema-on-write

Experiment: Schema-on-Read, Schema-on-Write

Data Lake

© Copyright 2016 HomeAway, Inc.

SchemaRegistry

Architecture: Kafka + Camus = BigData Ingress

© Copyright 2016 HomeAway, Inc.

Camus

© Copyright 2016 HomeAway, Inc.

The Results

Use Cases: ITOA / SLA Reporting

© Copyright 2016 HomeAway, Inc.

Use Cases: ITOA / SLA Reporting

© Copyright 2016 HomeAway, Inc.

Use Cases: Fraud

© Copyright 2016 HomeAway, Inc.

Use Cases: Search + ClickStream

© Copyright 2016 HomeAway, Inc.

User Behavior

Search RequestsA/B Test

Readouts

Proctor

EDAP

Use Cases: Search + ClickStream

© Copyright 2016 HomeAway, Inc.

Use Cases: Traveler Segmentation

© Copyright 2016 HomeAway, Inc.

EDAP

Data Mode

l

Lessons Learned

© Copyright 2016 HomeAway, Inc.

Lesson #1: The Schema [registry] is Everything!

Data Lake

© Copyright 2016 HomeAway, Inc.

SchemaRegistry

• Decouples producers from consumers

• Enforces backwards compatibility

• Enables self-service / democratization

• SOT for schemas in the pipe

Lesson #2: A Kafka/SR governance module is helpful

Data Lake

© Copyright 2016 HomeAway, Inc.

• TURN OFF Auto Topic Creation!

• Need a place for developersto request topics• Retention Policy• Expected Load• Compaction• Partition Size / Partition Key• Owner• LTS Date

Lesson #3: Make it easy to do stream processing

© Copyright 2016 HomeAway, Inc.

SchemaRegistry

• samza-archetype• samza-job-deployer

• Will evaluate k-streams!!!!

http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple

Next Steps

© Copyright 2016 HomeAway, Inc.

Consistency : 3 types of Data

© Copyright 2016 HomeAway, Inc.

Event

Document

Transactional

Kafka Producer Spooling

© Copyright 2016 HomeAway, Inc.

Conclusion

© Copyright 2016 HomeAway, Inc.

Yesterday

© Copyright 2016 HomeAway, Inc.

Systems of Record

Today

© Copyright 2016 HomeAway, Inc.

Systems of Engagement

Tomorrow

© Copyright 2016 HomeAway, Inc.

Systems of Intelligence

Don’t be a dinosaur…

© Copyright 2016 HomeAway, Inc.

ATDT

Thank you

© Copyright 2016 HomeAway, Inc.

End of Presentation

© Copyright 2016 HomeAway, Inc.

top related