open data science conference agile data

27
AGILE DATA Christopher Bergh Head Chef, DataKitchen O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci

Upload: datakitchen

Post on 28-Jul-2015

370 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Open Data Science Conference Agile Data

AGILE DATAChristopher Bergh

Head Chef,

DataKitchen

O P E N

D A T A

S C I E N C E

C O N F E R E N C E_BOSTON 2015

@opendatasci

Page 2: Open Data Science Conference Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 3: Open Data Science Conference Agile Data

3K I T C H E NDATA

Algorithm Nerd

Columbia, MIT, NASA-Ames; ATC Automation

Into In 1990

Fuzzy Logic, Neural Networks, Constraint Satisfaction; Unix/C

Software Nerd

CTO, Dir Engineering, VP Product Management

Into In 2000

Management of Software Teams &

Startups; PowerPoint

Data Nerd

COO: ETL Engineers, Analysts & Analytic Tool

Into In 2010

W. Edwards Deming, Data, Bootstrapping;

Excel Hacking

WHO AM I

Page 4: Open Data Science Conference Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 5: Open Data Science Conference Agile Data

SO WHAT IS THE PROBLEM?

In one word ….

Page 6: Open Data Science Conference Agile Data

LOTSATechnologies in Analytics

Page 7: Open Data Science Conference Agile Data

LOTSAPeople In Analytic Teams

DATA SCIENTIST

REPORTING ANALYST

ETL ENGINEER

DATABASE ARCHITECT

DEV OPS ENGINEERData Governance

Page 8: Open Data Science Conference Agile Data

LOTSAData & Analysis

ONE OFF

RE

USE

Page 9: Open Data Science Conference Agile Data

LOTSAMissed Expectations

Analyze

Prepare Data

C

Analyze

Prepare Data

Business Customer Expectation Analyst Reality

Communicate The business does not think that Analysts are preparing data

Analysts don’t want to prepare data

Page 10: Open Data Science Conference Agile Data

Complexity

Another Field, Software Development, Ran into the Same Problems With Complexity ...

… They Used Something Called ‘Agile’ To Solve The Problem

Page 11: Open Data Science Conference Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 12: Open Data Science Conference Agile Data

AGILEMANIFESTO.ORG

5/31/2015 12

AGILEMANIFESTO.ORG

Page 13: Open Data Science Conference Agile Data

AGILEMANIFESTO.ORG

13

analytics

Page 14: Open Data Science Conference Agile Data

s/software/analytics/

Page 15: Open Data Science Conference Agile Data

PRACTICES THAT ARE EASY TO APPLY

Development Sprints

User Stories

Daily Meetings

Defined Roles

Retrospectives

Pair Programming

Burn Down Charts

Page 16: Open Data Science Conference Agile Data

SOME PRACTICES HAVE BEEN DIFFICULT TO APPLY

Test Driven Development

Branching And Merging

Refactoring

Small Releases

Frequent Or Continuous Integration

Experimentation For Learning

Individual Development Environments

Page 17: Open Data Science Conference Agile Data

AGILE – WHAT IS UNIQUE TO ANALYTICS?

17

PUT THE

ANALYST AT

THE CENTER

Page 18: Open Data Science Conference Agile Data

AGILE – WHAT IS UNIQUE TO ANALYTICS?

ANALYICS

PERCIEVED

VALUE DECAY

CURVE

Page 19: Open Data Science Conference Agile Data

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Page 20: Open Data Science Conference Agile Data

Why? Your work is just code: models, transforms, etc.

Use a source code control system (like GIT) to enable:

Branching

Merging

Diff

5/31/2015 20

1. MANAGE YOUR WORK LIKE CODE

Page 21: Open Data Science Conference Agile Data

2. TEST AND CONTAIN

1. Create and monitor tests

2. Test on separate data from production

3. Run tests early and often

4. Target 20% of code for tests

5/31/2015 21

Unit Tests & Systems Test … Keep Adding & Improving

1. Break up you work into components

2. Manage the environment for each component (e.g. Docker, AMI)

3. Practice Environment Version Control

Page 22: Open Data Science Conference Agile Data

3. PROVIDE SEPARATE ENVIRONMENTS FOR ANALYSTS

Why?

Analysts need their data the data to iterate, develop & explore.

5/31/2015 22

Page 23: Open Data Science Conference Agile Data

4. SUPPORT THREE TYPES OF WORKFLOWS

Small Team

Work directly on production

Feature Branch

Merge back to production branch

Data Governance

3rd party verification before production merge

5/31/2015 23

Review

Test

Approve

Page 24: Open Data Science Conference Agile Data

5. GIVE ANALYSTS ABILITY TO EDIT DATABASE SAFELY

5/31/2015 24

Best-in-class companies take 12 days

to integrate new data sources into

their analytical systems; industry

average companies take 60 days;

and, laggards average 143 days

Source: Aberdeen Group: Data Management for BI: Fueling the analytical engine with high-octane information

Figure out how to

do this in

minutes

Page 25: Open Data Science Conference Agile Data

CONCLUSION

Page 26: Open Data Science Conference Agile Data

CONCLUSION

Page 27: Open Data Science Conference Agile Data

AGILE DATA Christopher Bergh

[email protected]

Questions?

Comments?

BOSTON 2015

@opendatasci