open data science conference agile data

Post on 28-Jul-2015

370 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AGILE DATAChristopher Bergh

Head Chef,

DataKitchen

O P E N

D A T A

S C I E N C E

C O N F E R E N C E_BOSTON 2015

@opendatasci

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

3K I T C H E NDATA

Algorithm Nerd

Columbia, MIT, NASA-Ames; ATC Automation

Into In 1990

Fuzzy Logic, Neural Networks, Constraint Satisfaction; Unix/C

Software Nerd

CTO, Dir Engineering, VP Product Management

Into In 2000

Management of Software Teams &

Startups; PowerPoint

Data Nerd

COO: ETL Engineers, Analysts & Analytic Tool

Into In 2010

W. Edwards Deming, Data, Bootstrapping;

Excel Hacking

WHO AM I

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

SO WHAT IS THE PROBLEM?

In one word ….

LOTSATechnologies in Analytics

LOTSAPeople In Analytic Teams

DATA SCIENTIST

REPORTING ANALYST

ETL ENGINEER

DATABASE ARCHITECT

DEV OPS ENGINEERData Governance

LOTSAData & Analysis

ONE OFF

RE

USE

LOTSAMissed Expectations

Analyze

Prepare Data

C

Analyze

Prepare Data

Business Customer Expectation Analyst Reality

Communicate The business does not think that Analysts are preparing data

Analysts don’t want to prepare data

Complexity

Another Field, Software Development, Ran into the Same Problems With Complexity ...

… They Used Something Called ‘Agile’ To Solve The Problem

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

AGILEMANIFESTO.ORG

5/31/2015 12

AGILEMANIFESTO.ORG

AGILEMANIFESTO.ORG

13

analytics

s/software/analytics/

PRACTICES THAT ARE EASY TO APPLY

Development Sprints

User Stories

Daily Meetings

Defined Roles

Retrospectives

Pair Programming

Burn Down Charts

SOME PRACTICES HAVE BEEN DIFFICULT TO APPLY

Test Driven Development

Branching And Merging

Refactoring

Small Releases

Frequent Or Continuous Integration

Experimentation For Learning

Individual Development Environments

AGILE – WHAT IS UNIQUE TO ANALYTICS?

17

PUT THE

ANALYST AT

THE CENTER

AGILE – WHAT IS UNIQUE TO ANALYTICS?

ANALYICS

PERCIEVED

VALUE DECAY

CURVE

AGENDA

Who Am I?

What Is The Problem?

A Look At Agile Through Data Lens

How To Do Agile Data In Five Shocking Steps

Why? Your work is just code: models, transforms, etc.

Use a source code control system (like GIT) to enable:

Branching

Merging

Diff

5/31/2015 20

1. MANAGE YOUR WORK LIKE CODE

2. TEST AND CONTAIN

1. Create and monitor tests

2. Test on separate data from production

3. Run tests early and often

4. Target 20% of code for tests

5/31/2015 21

Unit Tests & Systems Test … Keep Adding & Improving

1. Break up you work into components

2. Manage the environment for each component (e.g. Docker, AMI)

3. Practice Environment Version Control

3. PROVIDE SEPARATE ENVIRONMENTS FOR ANALYSTS

Why?

Analysts need their data the data to iterate, develop & explore.

5/31/2015 22

4. SUPPORT THREE TYPES OF WORKFLOWS

Small Team

Work directly on production

Feature Branch

Merge back to production branch

Data Governance

3rd party verification before production merge

5/31/2015 23

Review

Test

Approve

5. GIVE ANALYSTS ABILITY TO EDIT DATABASE SAFELY

5/31/2015 24

Best-in-class companies take 12 days

to integrate new data sources into

their analytical systems; industry

average companies take 60 days;

and, laggards average 143 days

Source: Aberdeen Group: Data Management for BI: Fueling the analytical engine with high-octane information

Figure out how to

do this in

minutes

CONCLUSION

CONCLUSION

AGILE DATA Christopher Bergh

cbergh@datakitchen.io

Questions?

Comments?

BOSTON 2015

@opendatasci

top related