the rise of the dataops - dataiku - j on the beach 2016
TRANSCRIPT
![Page 1: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/1.jpg)
The Role of the DevOps in theData Analytics Teams
J ON THE BEACH05/21/16
MORPHED WITH DEEP LEARNING™
TYPICAL OPS GUY (source: Reddit)
TYPICAL YOUNG DATA SCIENTIST(source: Common Sense)
![Page 2: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/2.jpg)
My initial interests
Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection and Vms
Graph Analytics Chess IA Natural Language Processing 80% Emacs / 20% VIM
![Page 3: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/3.jpg)
So to sum it up …
I (USED TO?) TO BE A BIG NERD
![Page 4: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/4.jpg)
Collaboration
CLICKERS CODERS
Software is a Human Problem
I ended up buildingA collaborative software
For data science ....
![Page 5: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/5.jpg)
DEV OPS&& DATA
![Page 6: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/6.jpg)
Let’s get back to the (brief) history of DevOps
Agile Conference, 2008
Scrum, and Agile in an operational context
He!WeshouldhaveourownvelocityinBelgium
10 deploys per day : Dev and Op Operation at Flickr
O’Reilly Velocity, June 2009Patrick Dubois
2007
Dev
Ops
QA
DevOpsDays
Ghent, October 2009
![Page 7: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/7.jpg)
DevOps
DevOps is the practice of operations and development
engineers participating together in the entire service lifecycle,
from design through the development
process to production support.
DevOps is also characterized by operations staff making
use many of the same techniques as developers for
their systems work.
Invite Ops to the Dev MeetingOh. And let them SPEAK
Ops should know how to code
![Page 8: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/8.jpg)
Let’s take an example: John devops from 2009
Learnt Python the Hard WayStarted with Puppet 1.0
Used EC2 before ELB and EBS !
![Page 9: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/9.jpg)
Hegelian perspective
Conflict and FrustrationConcept Combination Catharsis
Create CultureShare
Create Tools
Dev+
Ops
![Page 10: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/10.jpg)
There’s been op associated to data for a while ?
It’s called Business Intelligence !
![Page 11: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/11.jpg)
History of Data Analytics (Oversimplified)
2013 2014 2015 2016 2017 2018
Moving to a world of automated decision making
DATA FOR MORE INSIGHTS
DATAFOR AUTOMATED DECISIONS
![Page 12: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/12.jpg)
The Age Of Distributed Intelligence
Global,PersonalisedandRealTimeDataDrivenServices
![Page 13: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/13.jpg)
Data, Analytics and Data Science
Conflict and FrustrationConcept Combination Catharsis
Create CultureShare
Create Tools
Data+
Science
![Page 14: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/14.jpg)
Welcome to Technoslavia !
![Page 15: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/15.jpg)
Classic Business Intelligence Team Organization
Business Leader Data Consumer
Line-of-business Data Consumer Business Project
Sponsor
BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
SpecsDim
Big Boss
![Page 16: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/16.jpg)
Data Science Team Organization
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor
Data Engineer
Data Analyst
System Engineer / Data Architect
Business Needs
Data Scientist
ITConstraints
I.T.
![Page 17: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/17.jpg)
Is there room for a new role ?
Data Plumberer
DataEngineer
Data Scientist
Data Waiter
DataCleaner
DataAnalyst
REALJOB
DREAMJOB
DevOps For Data?
![Page 18: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/18.jpg)
Imaginea company building
a new ”smart car” app: AutoFine™
”Revolutionary Collaborative network that check the quality of your driving and punishYou with virtual fines if you’re a bad driver”
![Page 19: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/19.jpg)
Imaginea company building
a new ”smart car” service AutoFine™
10 TB of Data Every Month
Hive / Spark / Python
10 Different Predictive Models
Real-Time API / Workflow
![Page 20: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/20.jpg)
????
????
OPERATIONS : Whose is responsible for …
Check that the newly trained model perform as
expected
Check that the product catalog and the website tags remain consistent
Check that the Hadoop cluster scales as expected and as enough bandwidth to handle the workload
Test the performance for the real-time API
Monitor the performance of the model and decide to
rollback / maintain / rollout
![Page 21: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/21.jpg)
![Page 22: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/22.jpg)
DATA OPSAs a Philosophy
![Page 23: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/23.jpg)
X OPS PHILOSOPHY
Highly consensual
Highly controversial
![Page 24: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/24.jpg)
Create an API culture
Do not shareo Random Piece of Codeo Flat Fileo Email
Do shareü Reproductible documented workflowsü Clean, documented APIs
![Page 25: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/25.jpg)
Defensive Data Programming
•Software has errors.•You are not your software, yet you are are responsible for the errors.•You can never remove the errors, only reduce their probability.
![Page 26: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/26.jpg)
Defensive Data Programming
•Handle the case when one of the input file is empty•Handle the case when a new value appear •Handle the case when two columns become completely correlated•Handle the case when a column is 16k long •Etc.. Etc. etc…
![Page 27: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/27.jpg)
Monitoring : the alerts for people who love it
• Performance ….• Time Spent … • Number of Errors …
![Page 28: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/28.jpg)
Monitoring : Business Informal Monitoring
• % Opening • Market Spent • Exception User Events …
![Page 29: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/29.jpg)
Resource Allocation
I’ve got this strangeError ”OutOfMemory” . Do you know what it is
?
Why is the Hadoop Cluster going slower than my laptop ?
![Page 30: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/30.jpg)
The Philosophy of pre-allocating more resources than necessary
![Page 31: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/31.jpg)
Get to the latest package culture …
Data Scientist
I need the latest version of scikitAnd networkX ….
And coud you repackage that To enable TensorFlow optimizations ?
System Administrator
…..
![Page 32: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/32.jpg)
The culture of containers
Developers’ Sandbox
![Page 33: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/33.jpg)
DATA OPSAs a Job Title
![Page 34: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/34.jpg)
Job Title : a matter of name, $$ and social ladder
Data scientist Data Ops
Developer
Statistician
Full Stack Developer
Sys Admin
DevOps
![Page 35: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/35.jpg)
Job Role : A matter of Do or Don’t
DO DON’TThings you really want to do Things you really don’t want to get into
![Page 36: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/36.jpg)
FIGHT THE TOY PLATFORM ANTI-PATTERN
Test and Invest in Infrastructure == Skilled Peopleor
Go For Cloud / Packaged Infrastructure
YourBrandNewHadoopClusterisperceivedasslow,notsousedandnotreliable
![Page 37: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/37.jpg)
FIGHT THE TECHNO MISMATCH ANTI-PATTERN
Assume Being Polyglotor
Be a Dictator
VS
VS
ThePythonClan
TheRTribe
TheOldElephantFraternity
TheNewElephantClub
![Page 38: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/38.jpg)
GETTING DATA POLITICS
> DATA NOT AVAILABLE
![Page 39: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/39.jpg)
GETTING DATA POLITICS THEFOX
Hunt for Big Problem!
Convince the CEO that you can Solve a Business Critical problem And use it as an excuse to get allThe data you want !
THESPIDER
Create Network !
Create a set of trackers or Addictive Data Collection internallyTo get Data on your side !
![Page 40: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/40.jpg)
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
Website2000’winners
Companiesthatwereabletorelease fast
"ArtificialIntelligencewithDataforInternetofThings"2010’winners
Companiesabletoputintelligenceinproduction
?
Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
![Page 41: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/41.jpg)
OWN ANONYMISATION / PRIVACY / DATA SECURITY WITH PARTNERS ISSUES
Technical Feasibility ? What can or cannot be done ?
![Page 42: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/42.jpg)
Let’s Wrap IT Up ! A Company Building a GPS powered automated car fine system
10 TB of Data Every Month
Hive / Spark / Python
10 Different Predictive Models
Real-Time API / Workflow
Robust Workflow
With Data Quality
Checks
Functional MonitoringBy Business
People through
Slack and Dashboards
Monitoring for the API
Feature Engineering Pipeline in
Python
![Page 43: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/43.jpg)
But you where do you stand ?
???? ???? ???? ?????
What's your roll-back strategy like?
What kind of multi-variate testing or strategies do you have in place for predictive models?
How do you manage the robustness of your data flow production scripts?
How can business people monitor the performance of the application?
![Page 44: The Rise of the DataOps - Dataiku - J On the Beach 2016](https://reader034.vdocuments.net/reader034/viewer/2022051503/586fdeaf1a28ab18428b6cd9/html5/thumbnails/44.jpg)
http://bit.ly/production-survey
Food forthoughtswww.dataiku.com/blog
THANKYOU!http://bit.ly/production-survey http://bit.ly/production-survey