building & scaling data teams
TRANSCRIPT
Meet Hal’s Boss, DIM
Hey Hal ! We need a big data platform like the big guys. Just do what they’re doing!
‟
”Big Data Copy Cat Project
3Outreachdigital.org @outreachdigit
TOY PLATFORM ANTI-PATTERN
6
Test and Invest in Infrastructure == Skilled Peopleor
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Cluster is perceived as slow, not so used and not reliable
Outreachdigital.org @outreachdigit
TECHNO MISMATCH ANTI-PATTERN
7
Assume Being Polyglotor
Be a Dictator
VS
VS
The Python Clan
The R Tribe
The Old Elephant Fraternity
The New Elephant Club
Outreachdigital.org @outreachdigit
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
8
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDICTIVE MODELS” IN PRODUCTION
Outreachdigital.org @outreachdigit
Classic Business Intelligence Team Organization
Business Leader Data Consumer
Line-of-business Data Consumer Business Project
Sponsor
BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
SpecsDimBig Boss
10Outreachdigital.org @outreachdigit
Data Science Team Organization
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor
Data Engineer
Data Analyst
System Engineer / Data Architect
Business Needs
Data Scientist
ITConstraints
I.T.11Outreachdigital.org @outreachdigit
Manage Expectations
12
Data Plumberer
Data Engineer
Data Scientist
Data Waiter
Data Cleaner
Data Analyst
REALJOB
DREAMJOB
Outreachdigital.org @outreachdigit
Managing Extreme Personalities
13
Data Scientist
Highly Creative
Passionate
Hard to hire?
Hard to manage?
Want to take Hal’s job? Ambitious
Hard to retain?
Outreachdigital.org @outreachdigit
Paired for Data
14
Data AnalystDiscover Patterns
Data EngineerMake things work
Fightdata entropy
Fighttech
entropy
Outreachdigital.org @outreachdigit
What do you prefer?
15
One AnalystOne EngineerOne Data Scientist
Four data scientists
OR
Outreachdigital.org @outreachdigit
What is the main reason for data project to fail ?
18
> DATA NOT AVAILABLE
Outreachdigital.org @outreachdigit
BUT FOR ONLY INCREMENTAL GAIN
Contribution to the overall project performance
0% 25% 50% 75% 100%
20%30%50%
Business Goal Definition and Data Feature Engineering Algorithm
Outreachdigital.org @outreachdigit 19
How to Get Data if you don’t have it
20
THE GRASSHOPER THE SPIDER THE FOX
Outreachdigital.org @outreachdigit
The Cicada : Optimistic and Opportunistic Data
22
THE CICADAAs a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company- Wait for data to be available in your data lake
Outreachdigital.org @outreachdigit
The Spider: Power of the Network
23
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)- Make it available for free- Build your service on people’s collected data
- Make a web service available to collect data- Promote it internally so that people use it
Outreachdigital.org @outreachdigit
The Fox: Hunt for the Big Money first
24
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem- Build a SaaS solution using their data- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request- Build your own integrated tech team to solve it - Use those ressources to reset data services internally
Outreachdigital.org @outreachdigit
The Age Of Distributed Intelligence
27
Global, Personalised and Real Time Data Driven Services
Outreachdigital.org @outreachdigit
Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Moving to a world of automated decision making
28
DATA FOR MORE INSIGHTS
DATAFOR AUTOMATED DECISIONS
Outreachdigital.org @outreachdigit
Involve Product Team
29
Product Feature Personalised Item Ranking
Product Feature Notify User Only when Needed
Product Feature: Historical Data For Path Optimisation
Have Product Management Deeply Involved In the Data Team
Outreachdigital.org @outreachdigit
Focus on your added value
30
Build by the Data Team
Is the problem at the Core of my Business Process?
Is it a common problem / with share data?
Can i solve it on my own?
Really?
Hire Consultant and Learn
Build by the Data Team
Go for Best of Breed SaaS
Solution
Build by the Data Team?
YesNo
No Yes
No Yes
No Yes
Outreachdigital.org @outreachdigit
Create an API culture
Do not shareo Random Piece of Codeo Flat Fileo Email
Do share✓ Reproductible documented workflows✓ Clean, documented APIs
Outreachdigital.org @outreachdigit
Did Hal found his solutions ?
Technology
Data
People
Product
Polyglot on top of open source
Find a way to make clickers and coders work together
Create an API culture and involve the product teams
Hunt for Big Problems and Convince the CEO Is this the end ?‟”Hal Alowne BI Manager Dim’s Private Showroom
Outreachdigital.org @outreachdigit 32
Objective Alignment
Autonomous Vehicles Need Experimental Ethics: Are We Ready for Utilitarian Cars? http://arxiv.org/abs/1510.03346
Outreachdigital.org @outreachdigit 35
Data-Driven Artificial Sales & Marketing ? ARTIFICIAL
SUPERVISOR
Please Call The customers
Please Call again
Could you add a JOKE at the end of
this emailI need you to
ATTEND A physical meeting Here is the BRIEF
Analyzing continuously prospect behavior on social networks, applications and websites
Outreachdigital.org @outreachdigit 36
I don’t know the answer but here a free software for the data addicts in your company
data scientists and engineers
25
by the numbers
for clickers and coders
3000lovelyusers
80customers
by the customers
Outreachdigital.org @outreachdigit
Food for thoughts www.dataiku.com/blog
THANK YOU !
FREE (as in Beer) Software www.dataiku.com/dss
Outreachdigital.org @outreachdigit
Car Sharing Worldwide
Leader
Flash Sales Worldwide
Leader
One Mission : Never leave Hal ALONE
3,700 Hotels
Worldwide
2500 lovelyusers
by the numbers
70 customers
Outreachdigital.org @outreachdigit
My nerdy background
Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection and
Vms
Graph Analytics Chess IA Natural Language Processing 80% Emacs / 20% VIM
Outreachdigital.org @outreachdigit