building better analytics workflows (strata-hadoop world 2013)

46
Strata-Hadoop World 2013 Building better analytics workflows

Upload: wes-mckinney

Post on 12-Jun-2015

67.686 views

Category:

Technology


2 download

DESCRIPTION

Wes McKinney (twitter.com/wesmckinn, http://datapad.io) talk from Strata 2013 NYC

TRANSCRIPT

Page 1: Building Better Analytics Workflows (Strata-Hadoop World 2013)

Strata-Hadoop World 2013

Building better analytics workflows

Page 2: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Wes McKinney

2

• Former quant @ AQR (a hedge fund)

• Creator of Pandas project for Python

• Author of Python for Data Analysis — O’Reilly

• Founder and CEO of DataPad

@wesmckinn

Page 3: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

• > 20k copies since Oct 2012• Bringing many new people

to Python and data analysis with code

3

Page 4: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

• Increasing data scale

• More and more data munging/integration

• Need for Statistics and Predictive Analytics

• Building complex data visualizations

• Inadequacy of Excel or other UI-driven data tools

4

Why so many learning to program?

Page 5: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io5

Acquisition Preparation Visualization Analysis Sharing

The Analytics Workflow

Page 6: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io6

The Analytics Workflow

Page 7: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io7

What do we care about?

•Minimize time to answer

•Ask more questions

•Reduce friction between tools and processes

•Team productivity

Page 8: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Data Tools for Humans (TM?)

8

Page 9: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io9

What can go wrong?

•Inefficient workflows lead to lower quality analysis

•Results may not be actionable in a reasonable time-frame

Page 11: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io11

Three type of problems

•Tooling

•Workflow management

•Collaboration

Page 12: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Big Notable Data Trends

12

Page 13: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Data Preparation: an ongoing problem

13

Page 15: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

For programmers, luckily it’s not 2005 anymore

•R: Hadley Wickham’s packages

•Python: pandas

•Hadoop: Pig

Page 16: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Data preparation withvisual tools

•Google OpenRefine

•Google Fusion Tables

•Microsoft Excel

•Data Wrangler

Page 17: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Some new startups building data preparation tools

Page 18: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Business Intelligence:essential for doing business

Page 19: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

BI macro-trends

•Self Service BI

•Visual Discovery

•SQL on Hadoop

Page 20: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

It’s the hey-day for BI startups

Page 21: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Predictive Analytics is getting easier

Page 22: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Some predictive analytics startups

Page 23: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Perils of “data science in a box”

Page 24: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Predictive analytics pitfalls

•Signal vs. Noise

• Identify the right patterns

•Uncertain ROI

Page 25: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Some analytics workflow problems still need work

Page 26: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Friction between tools

Page 27: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Friction between tools:a typical scenario

•Excel and SQL for data wrangling

•Tableau for visualization

•SPSS/R for modeling

Page 28: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Time series analytics

Page 29: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Large scale visualization

Page 30: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io30

A

B

C D

E

F

Data workflows as dependency graphs?

Page 31: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io31

Data workflows as dependency graphs?

CHRONOS

Page 32: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Iterating on analysis

Page 33: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Versioning and provenance

Page 34: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Leveraging diverse skill sets

•Within teams, different competencies

•Work together on a data project - sharing code, data, tracking changes

Page 35: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

The elusiveGitHub for Data Analysis?

Page 36: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

...Google Docs for Data Analysis?

Page 37: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Make an impact

•Getting results into the hands of people who need it

•Getting models "into production"

Page 38: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Some possible solutions

Page 39: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Build more integrated tool environments

Page 41: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Enhance collaboration

Page 42: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Accessible data science...with training wheels

Page 43: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

One more thing

Page 44: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

•http://datapad.io

•Founded in 2013, located in SF

• In private beta, join us!

•Hiring for engineering

Page 45: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Q&A time

Page 46: Building Better Analytics Workflows (Strata-Hadoop World 2013)

www.datapad.io

Thank you!

46