the future of mlops - bristol data scientists...the future of mlops … and how did we get here?...
TRANSCRIPT
The Future of MLOps… and how did we get here?
Luke Marsden, CEO & Founder
@lmarsden #mlops
Software Engineering
DevOps
Machine Learning
The state of...
@lmarsden #mlops
Software EngineeringHow we write software
From punch cards… to multi-user UNIX systems...
To personal computers…
To networked computers…
Emailing patch files…
Centralised version control…
Distributed version control… Git! GitHub!
@lmarsden #mlops
DevOpsHow we deploy software
From editing code live on the server…
To building binaries and emailing them around…
To continuous integration…
To public cloud...
To immutable infrastructure… Docker! Kubernetes! GitOps!
@lmarsden #mlops
Machine Learning (AI)How we use data + maths to train models
From curve fitting, linear regressions…
To Markov chains… to neural networks…
To the first 'AI winter'...
Rediscovery of backprop, move to data driven approach…
Deep learning becomes computationally feasible...
@lmarsden #mlops
As ML emerges from research, these disciplines need to converge…
The convergence of software engineering,
DevOps and ML will be called MLOps
@lmarsden #mlops
AI/ML has the potential to reinvent the global economy
But as a discipline it's immature —it's the Wild West out there
We've seen damaging levels of chaos and pain operationalizing AI/ML
@lmarsden #mlops
Key problems affecting AI efforts today
Many teams have problems they don't realize exist, because it's just "how it's done"
1. Can’t deploy – models blocked2. Wasting time3. Inefficient collaboration4. Manual tracking5. No reproducibility or provenance6. Unmonitored models
and more...@lmarsden #mlops
We’ve been here before!
In the 90s, software engineering was siloed.
Not everything was version controlled, there was no continuous delivery.
Software took months to ship; now it ships in minutes.
The same is possible for AI...
@lmarsden #mlops
Requirements to achieveMLOps
2. AccountableMust be able to trace back from model in production to its provenance
4. ContinuousMust be able to deploy automatically & monitor statistically
MANIFESTO
1. ReproducibleMust be able to retrain a 9 month old model to within few %
3. CollaborativeMust be able to do asynchronous collaboration
@lmarsden #mlops
Code Test Deploy
Software Development
Machine Learning
Data runs Code Parameters
Model runs
Models / metrics Deploy
Monitor
Monitor
Data engineering & ML are different from software development
There are unique ways that working with data, code & models is different to just working with code.
This makes existing software tools insufficient...
@lmarsden #mlops
Code Test Deploy
Software Development
Machine Learning
Data runs Code Parameters
Model runs
Models / metrics Deploy
Monitor
Monitor
By tracking runs, you can achieve MLOps
Normal software development tools focus on commits of code. But ML has many more moving parts: data, models & metrics.
You need to track runs, bundling data, code & param versions that went into creating an intermediate dataset or a model. This provides full context for reproducibility, and provenance to connect data engineering with model training to track back for accountability.
@lmarsden #mlops
The MLOps model lifecycle.
@lmarsden #mlops
DEPLOYMODELS
STATISTICALMONITORING
Model runs
Metrics Realtime production traffic
Real-time decisionsTraining
V2V3
DEVELOPMENT PRODUCTIONV1FEB
MAR
Data
Labels
Features
Data runsRaw
DATA ENGINEERING
Data EngineeringIn data engineering pipelines, you need to track data runs.
As raw data is processed, features engineered & samples annotated with labels, every data version needs to be recorded and made available for model development with full provenance. Need to avoid questions about which data was used to train a model.
@lmarsden #mlops
DEPLOYMODELS
STATISTICALMONITORING
Model runs
Metrics Realtime production traffic
Real-time decisionsTraining
V2V3
DEVELOPMENT PRODUCTIONV1FEB
MAR
Data
Labels
Features
Data runsRaw
DATA ENGINEERING
Model DevelopmentOnce your data is annotated and ready to start building models, need to track model runs.
You can increase team productivity by building a 'runnable ML knowledge base' to eliminate silos. Need to reduce key person risk by making it easy for anyone to pick up where another left off.
@lmarsden #mlops
DEPLOYMODELS
STATISTICALMONITORING
Model runs
Metrics Realtime production traffic
Real-time decisionsTraining
V2V3
DEVELOPMENT PRODUCTIONV1FEB
MAR
Data
Labels
Features
Data runsRaw
DATA ENGINEERING
Models in ProductionNeed to be able to push your models to production through a CI/CD system.
Once models are in production, need to keep them performing reliably. Need alerting on issues with statistical monitoring & to be confident changes to models to fix issues are working need ability to do forensic provenance tracking.
@lmarsden #mlops
Live demo!
@lmarsden #mlops@lmarsden #mlops
Thank you! Questions?
Please help🙏 [email protected]
Come talk to me & see a demo at dotscience.com😉
@lmarsden #mlops