rapid data analytics @ netflix

39
Rapid Data Analytics @ Netflix Jason Flittner Senior BI Engineer Chris Stephens Senior Data Engineer Monisha Kanoth Senior Data Architect

Upload: monisha-kanoth

Post on 20-Jan-2017

225 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Rapid Data Analytics @ Netflix

Rapid Data Analytics @ Netflix

Jason FlittnerSenior BI Engineer

Chris StephensSenior Data Engineer

Monisha KanothSenior Data Architect

Page 2: Rapid Data Analytics @ Netflix

What We Do

Page 3: Rapid Data Analytics @ Netflix

633643DEA @ NetflixContent Analytics

Page 4: Rapid Data Analytics @ Netflix

Global Expansion & Content Spend

Page 5: Rapid Data Analytics @ Netflix

Freedom & ResponsibilityHighly Aligned, Loosely CoupledContext, not Control

Culture + Technology

CourageJudgementHonestyCommunication CuriosityPassionInnovationImpactSelflessness

Page 6: Rapid Data Analytics @ Netflix

Parquet FF

Storage Compute Tools BI

AWSS3

(Hadoop clusters)

Page 7: Rapid Data Analytics @ Netflix

Deploy Fast, Fix Faster

● Improve & Iterate vs Perfect● Have a Rollback Plan Ready

Page 8: Rapid Data Analytics @ Netflix

Develop Business Logic not ETL

● Think in Patterns

Page 9: Rapid Data Analytics @ Netflix

The Path of Least Resistance is the Right Path

● Make Smart Engineering Tradeoffs

Page 10: Rapid Data Analytics @ Netflix

The Clock starts Ticking when you Deploy

● Every Data Pipeline comes with an Expiration Date

● Deprecate and Prune

Page 11: Rapid Data Analytics @ Netflix

No Man’s Land is Expensive

● Ownership

Page 12: Rapid Data Analytics @ Netflix

Be a Noob

● User Groups

Page 13: Rapid Data Analytics @ Netflix
Page 14: Rapid Data Analytics @ Netflix

What You Could Doin your Data Warehouse

Page 15: Rapid Data Analytics @ Netflix

Let everyone drop tables in production

Page 16: Rapid Data Analytics @ Netflix

Cost / BenefitConscientious people make mistakes,but not very often

Data warehouse is not an operational system

What happens if a table is accidentally dropped?● Do you have backups?● How quickly can you restore a table?

Is the benefit of worth the tax on every data / analytical product your team produces?

Page 17: Rapid Data Analytics @ Netflix

We have some protection

Page 18: Rapid Data Analytics @ Netflix

In Hive, all tables are external tables pointing to S3 locations.

ETL writes a new “batch” of data then updates the metastore.

s3://[bucket]/hive/schema.db/table/batchid=1459364911

ALTER TABLE table SET LOCATION [path to new batch ID];

DROP TABLE does not delete any data.

Page 19: Rapid Data Analytics @ Netflix

In our MPP databases, we have a procedure for upgrading and downgrading our privileges.

CALL admin.UpgradePrivileges('me')

Lasts for several hours. Usage is logged.

Accidents? Restore from backups. Or reload from Hive.

Page 20: Rapid Data Analytics @ Netflix

When other teams are ready to move to production ...

We’re done. And moving on to the next thing.

You can trust your people to work the same way.

Page 21: Rapid Data Analytics @ Netflix

Don’t have an “on call”(Use a “first responder” instead)

Page 22: Rapid Data Analytics @ Netflix

Everyone on the team takes a shift: both BI and data engineers (even managers every once in a while!)

First Responder = the first one to respond

● handles most common failures (restarting jobs)● reaches out directly to ETL owner if escalation is required● handles communication surrounding ETL delays

Page 23: Rapid Data Analytics @ Netflix

Goal is to protect the team’s time and focus

Page 24: Rapid Data Analytics @ Netflix

How we do this

● visually define what needs attention and what doesn’t○ “above the line” vs “below the line”

● email alerts for “above the line” jobs that take longer than normal

● playbook for fixing common stuff○ the more complete your entries are, the less you get

called!

Page 25: Rapid Data Analytics @ Netflix

Have a very clear sense of what is urgent, and what isn’t

Page 26: Rapid Data Analytics @ Netflix

Treating every failure like it’s urgent bleeds your team of the time they need to do work

Build your processes so they can be ignored for 3 days

● don’t load data if it’s incomplete● reprocess fact data for several days instead of picking up

the latest

Gives you the freedom to judge whether a failure is worth an interruption

Page 27: Rapid Data Analytics @ Netflix

Everybody owns ETL(when they need to)

Page 28: Rapid Data Analytics @ Netflix

BI engineer needs data structured a certain way for a report

Many environments:

● Ask a data engineer to build them a table

Our environment:

● Let them schedule a Hive script and adjust as necessary

Page 29: Rapid Data Analytics @ Netflix

We focus on centers of excellence, not role boundaries

Page 30: Rapid Data Analytics @ Netflix

More Examples:

● our BI engineers use Python to automate tasks

● our data engineers have Tableau licenses, and use them for quick visualizations and report deployments

For small tasks, this helps us avoid the overhead of interruption and knowledge transfer

Page 31: Rapid Data Analytics @ Netflix

What You Could Do on the Front-end

Page 32: Rapid Data Analytics @ Netflix

Parquet FF(Hadoop clusters)

Storage Compute Data Interface Data Access, Analytics and Visualization

AWS S3

Page 33: Rapid Data Analytics @ Netflix

Do Not Limit Yourself to Conventional Tools

○ Tableau - Data Visualization and Dashboards○ MicroStrategy - Dynamic SQL and Metadata○ Python or Custom Reporting - Emails

Page 34: Rapid Data Analytics @ Netflix

Give your BI Engineers Superpowers (like this guy)

○ Provide a data platform○ BI + Data Engineering○ Context not Requirements○ Be early adopters

Page 35: Rapid Data Analytics @ Netflix

Simple isOften Best

Page 36: Rapid Data Analytics @ Netflix

Dismantle your Data Warehouse Team

○ Integrate with the business○ Data Engineering and Data Science

teams○ Open and honest communication

Page 37: Rapid Data Analytics @ Netflix

Fast is better than perfect

○ Build, iterate… repeat○ How to handle adhocs○ Freedom - make the right call○ Responsibility - Ownership

Page 38: Rapid Data Analytics @ Netflix

EncourageHacking

Page 39: Rapid Data Analytics @ Netflix

Questions?

Want to chill with us!?jobs.netflix.com