the 3 key barriers keeping companies from deploying data products

90
The 3 Key Barriers Keeping Companies from Acting Upon the Possibilities That Big Data has to Oer

Upload: dataiku

Post on 21-Jan-2017

549 views

Category:

Presentations & Public Speaking


2 download

TRANSCRIPT

Page 1: The 3 Key Barriers Keeping Companies from Deploying Data Products

The 3 Key Barriers Keeping Companies from Acting Upon the Possibilities

That Big Data has to Offer

Page 2: The 3 Key Barriers Keeping Companies from Deploying Data Products

A little bit about me…

•  Born & raised in Palo Alto, California •  BA in European History From Columbia University •  Masters in Marketing & Communication from Sciences

Po Paris •  Director of Marketing at Dataiku •  Currently living in Paris

Page 3: The 3 Key Barriers Keeping Companies from Deploying Data Products

Shift from Uomo Universale

« A man can do all things if he will. » -Leon Battista Alberti (1404-72)

Excel at all things: •  Intellect •  Mathematics •  Science •  Art •  Social •  Physical

Page 4: The 3 Key Barriers Keeping Companies from Deploying Data Products

Shift from Uomo Universale To Expert

Page 5: The 3 Key Barriers Keeping Companies from Deploying Data Products

Required Assets: •  Hacker mindset •  Logic •  Statistics •  Polyglot Programmer •  Mathematics •  Algorithmics •  Engineering •  Databases •  Machine Learning •  Strong creativity •  Strategical thinker •  Business understanding •  Strong communication skills •  Project management

Data Science Superstar

Page 6: The 3 Key Barriers Keeping Companies from Deploying Data Products
Page 7: The 3 Key Barriers Keeping Companies from Deploying Data Products

Are we in some sort of Renaissance Era of Big Data?

Page 8: The 3 Key Barriers Keeping Companies from Deploying Data Products

If so, what’s next?

Page 9: The 3 Key Barriers Keeping Companies from Deploying Data Products

Investigation Part 1: What is a Data Product?

Page 10: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 11: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 12: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 13: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 14: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 15: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 16: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 17: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 18: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 19: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 20: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products

Page 21: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Products =

Data + Technology + Data Scientist? + End User

Page 22: The 3 Key Barriers Keeping Companies from Deploying Data Products

Investigation Part 2: What goes on behind the scenes?

Page 23: The 3 Key Barriers Keeping Companies from Deploying Data Products

Building a Data Product

User  Interface  Stream / Real-time Query Data Preprocessing

Page 24: The 3 Key Barriers Keeping Companies from Deploying Data Products

Building a (Predictive) Data Product

Predicted Data Historical Data Machine Learning Model Preprocessing

Page 25: The 3 Key Barriers Keeping Companies from Deploying Data Products

Building a (Predictive) Data Product

Predicted Data Historical Data Machine Learning Model Preprocessing

Pre-processing & cleaning data alone can take up to 80% of the time spent on a data project

Page 26: The 3 Key Barriers Keeping Companies from Deploying Data Products

Building a (Predictive) Data Product

Predicted Data Historical Data Machine Learning Model Preprocessing

Page 27: The 3 Key Barriers Keeping Companies from Deploying Data Products

Building a (Predictive) Data Product

Predicted Data Historical Data Machine Learning Model Preprocessing

Page 28: The 3 Key Barriers Keeping Companies from Deploying Data Products

Running a (Predictive) Data Product

Deployment

Real-time / Stream

Model

Preprocessing  

Predicted Data

Page 29: The 3 Key Barriers Keeping Companies from Deploying Data Products

Investigation Part 3: Who does what?

Page 30: The 3 Key Barriers Keeping Companies from Deploying Data Products

Customer Data!

Machine Data!

System Data!Graph Data!

Structured Data!

Unstructured Data!

Transactional Data!

Catalogue Data!

Web Log Data!

RAW DATA

Page 31: The 3 Key Barriers Keeping Companies from Deploying Data Products

System Architect / IT Team / Data Engineer

RAW

DAT

A Data Product

= Business Incentive

Page 32: The 3 Key Barriers Keeping Companies from Deploying Data Products

Mathematics / statistics

Data Product =

Business Incentive

Page 33: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Product =

Business Incentive

Mathematics / statistics / Business

Page 34: The 3 Key Barriers Keeping Companies from Deploying Data Products

« Data Scientist »

Data Product =

Business Incentive

Page 35: The 3 Key Barriers Keeping Companies from Deploying Data Products

« Data Scientist »?

Page 36: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Engineers « Data Scientist »

Page 37: The 3 Key Barriers Keeping Companies from Deploying Data Products

What I’ve Learned

Page 38: The 3 Key Barriers Keeping Companies from Deploying Data Products

Fact 1: The Skill Sets Exist

Business Statistics Math Data Engineering

Build Maintain

Page 39: The 3 Key Barriers Keeping Companies from Deploying Data Products

Fact 1: The Skill Sets Exist (& your company probably already has them)

Business Analyst Data Engineer

Build Maintain

Mathematician / Statistician

Page 40: The 3 Key Barriers Keeping Companies from Deploying Data Products

Fact 2: The Technologies Exist (and some are free!)

Page 41: The 3 Key Barriers Keeping Companies from Deploying Data Products

Fact 3: The Data Exists

Page 42: The 3 Key Barriers Keeping Companies from Deploying Data Products

Why is Production & Industrialisation of (Predictive) Data Products Important?

Page 43: The 3 Key Barriers Keeping Companies from Deploying Data Products

Those who win are those who deliver new data products continuously

Page 44: The 3 Key Barriers Keeping Companies from Deploying Data Products

Those who win are those who deliver data products

Data products are supposed to deliver business value… if you don’t deploy them,

where’s the long term value?

Page 45: The 3 Key Barriers Keeping Companies from Deploying Data Products

No Industrialisation = Limited ROI

Those who win are those who deliver data products

It’s like building your dream house but never moving in.

Page 46: The 3 Key Barriers Keeping Companies from Deploying Data Products

No Industrialisation = Limited ROI

Those who win are those who deliver data products

It’s like building your dream house but never moving in. Absurd!

Page 47: The 3 Key Barriers Keeping Companies from Deploying Data Products

So Why Aren’t More Companies Deploying (Predictive) Data Products?

Page 48: The 3 Key Barriers Keeping Companies from Deploying Data Products

A Data Product must be business focused (ROI) & mathematically accurate (RELIABLE)

Page 49: The 3 Key Barriers Keeping Companies from Deploying Data Products

1° Business Analytic & Algorithmic Minds Are Different…

Page 50: The 3 Key Barriers Keeping Companies from Deploying Data Products

1° Business Analytic & Algorithmic Minds Are Different…

The Business Analysts Brain

Patterns. Patterns. Patterns.

Page 51: The 3 Key Barriers Keeping Companies from Deploying Data Products

The Algorithmic Brain

Performance. Truth. Anomaly.

1° Business Analytic & Algorithmic Minds Are Different…

Page 52: The 3 Key Barriers Keeping Companies from Deploying Data Products

…But Your Data Product Needs Both

Patterns, patterns, patterns Performance, Truth, Anomaly

BUSINESS KNOWLEDGE

MATHEMATICAL ACCURACY

1° Business Analytic & Algorithmic Minds Are Different…

Page 53: The 3 Key Barriers Keeping Companies from Deploying Data Products

MINDSET •  Project alignment from conception to execution – install team mindset with

common goal – even if the paths to get there are different FRAMEWORK •  One common platform with enough flexibility for both mindsets to fully

exercise their individual skill and expertise on a common project

Resolving the Skill Gap

Page 54: The 3 Key Barriers Keeping Companies from Deploying Data Products

2° So Many Technologies, Languages, and Needs

Page 55: The 3 Key Barriers Keeping Companies from Deploying Data Products

2° So Many Technologies, Languages, and Needs

R  /  Python  

Page 56: The 3 Key Barriers Keeping Companies from Deploying Data Products

2° So Many Technologies, Languages, and Needs

Code-­‐free   R  /  Python  

Page 57: The 3 Key Barriers Keeping Companies from Deploying Data Products

2° So Many Technologies, Languages, and Needs

Code-­‐free  

SQL  

R  /  Python  

Page 58: The 3 Key Barriers Keeping Companies from Deploying Data Products

2° So Many Technologies, Languages, and Needs

Code-­‐free   R  /  Python  

Hadoop  

SQL  

Page 59: The 3 Key Barriers Keeping Companies from Deploying Data Products

…Or As My Boss Calls It: Technoslavia

Florian Douetteau Dataiku CEO

Page 60: The 3 Key Barriers Keeping Companies from Deploying Data Products

…Or As My Boss Calls It: Technoslavia

Florian Douetteau Dataiku CEO

Page 61: The 3 Key Barriers Keeping Companies from Deploying Data Products

…Or As My Boss Calls It: Technoslavia

Florian Douetteau Dataiku CEO

Page 62: The 3 Key Barriers Keeping Companies from Deploying Data Products

…Or As My Boss Calls It: Technoslavia

Florian Douetteau Dataiku CEO

Page 63: The 3 Key Barriers Keeping Companies from Deploying Data Products

OPTION 1: Enterprise dictatorship

Living in Harmony with Technoslavia

Page 64: The 3 Key Barriers Keeping Companies from Deploying Data Products

OPTION 2 (my personal favorite): Accept a polyglot approach

Living in Harmony with Technoslavia

Page 65: The 3 Key Barriers Keeping Companies from Deploying Data Products

3° Production and Industrialisation is Complex

Page 66: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data reliability is hard to guarantee

Technological Complexity 3° Production and Industrialisation is Complex

Page 67: The 3 Key Barriers Keeping Companies from Deploying Data Products

Technological Complexity

The assumption that production will identically reproduce the analysis phase is a hard promise to make

3° Production and Industrialisation is Complex

Page 68: The 3 Key Barriers Keeping Companies from Deploying Data Products

Technological Complexity

Monitoring a predictive model’s life cycle is a tedious and continuous task

3° Production and Industrialisation is Complex

Page 69: The 3 Key Barriers Keeping Companies from Deploying Data Products

Human & Organisational Complexity

BUILDING MAINTAINING

3° Production and Industrialisation is Complex

Page 70: The 3 Key Barriers Keeping Companies from Deploying Data Products

Human & Organisational Complexity

BUILDING MAINTAINING

Business Analyst •  patterns

3° Production and Industrialisation is Complex

Page 71: The 3 Key Barriers Keeping Companies from Deploying Data Products

Human & Organisational Complexity

BUILDING MAINTAINING

Business Analyst / Algorithmic •  patterns •  performance •  truth  

3° Production and Industrialisation is Complex

Page 72: The 3 Key Barriers Keeping Companies from Deploying Data Products

Business Analyst / Algorithmic •  patterns •  performance •  truth  

Data engineers •  stability •  reliability •  cost of ownership  

Human & Organisational Complexity

BUILDING MAINTAINING

3° Production and Industrialisation is Complex

Page 73: The 3 Key Barriers Keeping Companies from Deploying Data Products

Business Analyst / Algorithmic •  patterns •  performance •  truth  

Data engineers •  stability •  reliability •  cost of ownership  

Human & Organisational Complexity

BUILDING MAINTAINING

3° Production and Industrialisation is Complex

Page 74: The 3 Key Barriers Keeping Companies from Deploying Data Products

TIP #1: Invest in a platform where development and production are the same

DEVELOPMENT TEST PRODUCTION

Making Complexity Work for You

Page 75: The 3 Key Barriers Keeping Companies from Deploying Data Products

TIP #2: Invest in monitoring capabilities & strategies

Making Complexity Work for You

Page 76: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Engineers must have visibility and understanding of the key business metrics

TIP #3: Name your Data Engineer(s) Wisely & Define Responsibilities

Making Complexity Work for You

Page 77: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Engineers must know if (and when) a model is diverging

TIP #3: Name your Data Engineer(s) Wisely & Define Responsibilities

Making Complexity Work for You

Page 78: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Engineer must be responsible for quality of service

TIP #3: Name your Data Engineer(s) Wisely & Define Responsibilities

Making Complexity Work for You

Page 79: The 3 Key Barriers Keeping Companies from Deploying Data Products

Differentiate builders of new data products from those that maintain them.

TIP #4: It’s Not a One Man Show

Making Complexity Work for You

Page 80: The 3 Key Barriers Keeping Companies from Deploying Data Products

What To Expect?

Page 81: The 3 Key Barriers Keeping Companies from Deploying Data Products

From the Renaissance of Big Data… Where the Data Science Superstar is one person that excels at all skill sets…

…and where actual data products are rarely deployed and maintained

Page 82: The 3 Key Barriers Keeping Companies from Deploying Data Products

To the Enlightenment of (Big) Data Where the Data Science Superstar is a team of complimentary skill sets…

… and where data products are designed, built, tested, and deployed by a team of skilled individuals that each have a distinct role.

TEAM  

Page 83: The 3 Key Barriers Keeping Companies from Deploying Data Products

SPOTLIGHT on the Data Science Team Manager

Page 84: The 3 Key Barriers Keeping Companies from Deploying Data Products

The Rise of the Data Science Team Manager

The Data Science Team Manager must understand the stakeholders’ needs, translate them into a business need that can be answered with a data

product…

Page 85: The 3 Key Barriers Keeping Companies from Deploying Data Products

The Rise of the Data Science Team Manager

The Data Science Team Manager must permit and enable collaboration between business analysts, statisticians, & engineers…

Collaboratively design, build, & deploy

Data Products

Page 86: The 3 Key Barriers Keeping Companies from Deploying Data Products

The Rise of the Collaborative Data Science Team

…all the while maintaining the distinction between each individual role and each individual skill set.

Business Mathematics Data

Engineering

Page 87: The 3 Key Barriers Keeping Companies from Deploying Data Products

The Secret to Building and Industrializing Data Products is Collaboration.

Today, collaboration between different

skill sets, technologies, and data is finally possible.

Page 88: The 3 Key Barriers Keeping Companies from Deploying Data Products

Data Science Studio: One Platform for Development and Industrialization

Page 89: The 3 Key Barriers Keeping Companies from Deploying Data Products

Thank You!

Pauline Brown Dataiku, Director of Marketing [email protected] @pauline8brown www.dataiku.com

Page 90: The 3 Key Barriers Keeping Companies from Deploying Data Products