scaling through simplicity—how a 300 million user chat app reduced data engineering efforts by...

29
Scaling Through Simplicity: How a 300 million user chat app reduced data engineering efforts by 70% Joel Cumming Kik Interactive

Upload: spark-summit

Post on 21-Feb-2017

46 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Scaling Through Simplicity: How a 300 million user chat app reduced data engineering efforts by 70%

Joel CummingKik Interactive

Page 2: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming
Page 3: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming
Page 4: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming
Page 5: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

At Kik, we believe that everyone has the right to

be curious.

Page 6: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Data should be available to everyone and should be super easy to use.

Page 7: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

We have dashboards to glance at, reports to

analyze, and a data lake for exploration.

Page 8: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

However, Kik is a startup and we have to move

very quickly.

Page 9: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Moving quickly often comes at the expense of

scalable data engineering.

Page 10: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

How can we compete with Facebook and Google (and their data teams) with a tiny team and very little time to

master new tools?

Page 11: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Data v1 @ Kik

Page 12: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Data v1 @ Kik

Data Lake & Transformations

Exploration & Analysis

KPIs

Page 13: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

We decided to make 8 changes

Page 14: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Old

1. Streamline Data Collection via Kinesis Firehose

New

script

Page 15: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

2. Standardize Transformations with Spark SQL

Old

New

Page 16: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

3. Build a Data Lake (Caspian) in s3

Old

New

Page 17: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

4. Move from EMR to Managed Spark

Old

New

Page 18: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

5. Collaborate via Notebooks

Old

New

Page 19: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

6. Get Serious About Committing Code

Old

New

Page 20: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

7. Move to Airflow for Orchestration Flexibility

Old

New

Page 21: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

8. Standardize Reporting on re:dash

Old

New

Page 22: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Data v2 @ Kik

Page 23: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Recall: Data v1 @ Kik

Data Lake & Transformations

Exploration & Analysis

KPIs

Page 24: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Data v2 @ Kik: Scaling through Simplicity

Data Lake & Transformations Exploration & Analysis KPIs

SQL

Page 25: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

New data is available within an hour in a query optimized format. Transformations can be built and

scheduled in minutes. Reports can be developed just as quickly.

Page 26: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

We estimate we save about 70% of our prior effort

Data CollectionSpark SQLData Lake

Managed SparkNotebooks

Commiting CodeBetter Orchestration

Standardize Reporting

% Effort Savings (based on hours invested in related activities, v1 vs. v2)

0 5 10 15 20

Page 27: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

What’s Next?

Page 28: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

1. Spark as a DW? 2. Structured Streaming 3. Data Lake Cataloging

Page 29: Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engineering Efforts by 70%: Spark Summit East talk by Joel Cumming

Thank [email protected]