Download - ML and Data Science at Uber
![Page 1: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/1.jpg)
ML and Data Science at UberSudhir Tonse, Engineering Lead
Marketplace, Uber
FEB 18,
2017
![Page 2: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/2.jpg)
Where do we want to go today?
Agenda
![Page 3: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/3.jpg)
Introduction Problem Space Tools of the Trade
Challenges likely unique to
Uber .. interesting
opportunities
Challenges &
Opportunities
Who am I and what are we
talking about today?
Why does Uber need ML
and what are some of the
problems we tackle?
What does Uber’s tech
stack look like?
AgendaHop on the Uber ML Ride … destination please?
![Page 4: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/4.jpg)
Uber, this talk and me the speaker
Introduction
![Page 5: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/5.jpg)
• Engineering Leader @ Uber
• Marketplace Data
• Realtime Data Processing
• Analytics
• Forecasting
• Previous -> MicroServices/Cloud Platform at
Netflix
• Twitter @stonse
5
Who am I?
![Page 6: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/6.jpg)
Driver Partner Riders Merchants
Uber’s logistic platform
Marketplace
Our partner in the ride
sharing business
Folks like you and me who
request a ride on any of
Uber’s transportation
products. e.g. UberX,
uberPool
Restaurants or shops that
have signed on to the
Uber platform.
IntroductionUber
![Page 7: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/7.jpg)
“Transportation as reliable as
running water, everywhere, for
everyone”
Uber Mission
![Page 8: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/8.jpg)
• Mapping (Routes, ETAs, …)
• Fraud and Security
• uberEATS Recommendations
• Marketplace Optimizations
• Forecasting
• Driver Positioning
• Health, Trends, Issues, ...
• And more …
ML ProblemsWhy do we need Machine Learning?
ETA, Route Optimization,
Pickup Points, Pool rider
matches
![Page 9: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/9.jpg)
Marketplace
Build the platform, products, and algorithms
responsible for the real time execution and online
optimization of Uber's marketplace.
We are building the brain of Uber, solving NP-hard
algorithms and economic optimization problems at
scale.
Uber | MarketplaceMission
![Page 10: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/10.jpg)
Request Event
Driver Accept
EventTrip Started
Event
more events
…
Overall Flow
M
a
t
c
h
S
e
r
v
i
c
e
s
![Page 11: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/11.jpg)
Trip StatesSub-title
Events - for each action/state
Rider States Driver States
![Page 12: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/12.jpg)
Scale
~400 Cities
Many Billion Events per Day
![Page 13: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/13.jpg)
Scale
Geo
Space
Vehicle
TypesTime
![Page 14: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/14.jpg)
Space -> Hexagons
![Page 15: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/15.jpg)
Granular Data
![Page 16: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/16.jpg)
Scale ..For a fine grained OLAP system
1 day of data:
~400 (cities) x 10,000 (avg number of hexagons
per city) x 7 (Vehicle types) x 1440 (minutes per
day) x 13 (Trip States)
524 billion possible combinations
![Page 17: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/17.jpg)
OLAP Queries on Big Data
Realtime + Batch processing
![Page 18: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/18.jpg)
Data Processing
HDFS
![Page 19: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/19.jpg)
Multi-resolution Realtime Forecasting, Airport ETR
ML Examples
![Page 20: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/20.jpg)
Real-time spatiotemporal
forecasting at a variable
resolution of time and spaceExample 1
![Page 21: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/21.jpg)
Rider Demand ForecastingPredict #of Riders per hexagon for various time horizons
![Page 22: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/22.jpg)
Spatial granularity & Multiresolution Forecasting
Some small challenges
The more you aggregate
or zoom out, trends
emerge
Sparsity at hexagon level:
many hexagons have little
signal
![Page 23: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/23.jpg)
1. Forecast at the hex-cluster level
2. Using past activity for a similar time window,
apportion out total activity from the hex-
cluster to its component hexagons
Multiresolution ForecastingForecasting at different spatial granularity
![Page 24: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/24.jpg)
Airport ETR
ML Example No 2.
Airport Taxi Line Uber Airport Lot
![Page 25: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/25.jpg)
Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3)
Airport Demand (ETR)
Mean Delay
~30 minutes
Half Life
~ 1.0 minute
![Page 26: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/26.jpg)
“ETR too
much. I bail
out ..”
Solution: Time Meter Banner
“Only about 20
minutes. I would
wait!”
20 minutes wait to get a
$40 trip, oh yeah!
![Page 27: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/27.jpg)
Data Science FlowA Typical Data Scientist Workflow
Analyze/Prepare Feature SelectionModel Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signalsUse Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
![Page 28: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/28.jpg)
Data PreparationA Typical Data Scientist Workflow
Analyze/Prepare
Data exploration,
cleansing,
transformations etc.
Feature SelectionModel Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
MonitoringEvaluate strength of
various signalsUse Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
![Page 29: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/29.jpg)
Data Science FlowA Typical Data Scientist Workflow
Feature SelectionModel Fitting
Evaluation
StorageEvaluate strength of
various signalsUse Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
![Page 30: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/30.jpg)
Data Scientists (Analytics)
![Page 31: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/31.jpg)
Data Science FlowA Typical Data Scientist Workflow
Analyze/Prepare Feature SelectionModel Fitting
Evaluation
Storage Apply Model and serve
predictions
Evaluate Runtime
Performance
Serving/Dissemination
Monitoring
Data exploration,
cleansing,
transformations etc.
Evaluate strength of
various signalsUse Python/R etc. to fit
Model.
Evaluate Model
Performance
Store Model with
versioning
![Page 32: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/32.jpg)
Overview
Streamline the forecasting process
from conception to production
• Streams w/ flexible geo-
temporal resolution
• Valuable external data feeds
• Modular, reusable
components at each stage
• Same code for offline
model fitting and
production to enable fast
model iteration
Operators & Computation DAGs
Feature Generation
Online ModelsOffline Model Fitting
Predictions, Metrics & Visualizations
External Data Streams
Airport feed
Weather feed
Concerts feed
![Page 33: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/33.jpg)
Realtime Models
- Something happened at a time and a
place. Now we will
Evaluate the DAG
- DAG evaluated for a single instant in time
real-time spatiotemporal forecasting at a variable resolution of time and space
![Page 34: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/34.jpg)
Under the hood ..
Tools & Framework
![Page 35: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/35.jpg)
• Curated set of algorithms
• Model Versioning
• Model Performance & Visualizations
• Automated Deployment Workflow
• …
Machine Learning as a ServiceML workflow at Uber
![Page 36: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/36.jpg)
Open Source TechnologiesSub-title
Samza
Micro Batch based processing
Good integration with HDFS & S3
Exactly once semantics
Spark Streaming
Well integrated with Kafka
Built in State Management
Built in Checkpointing
Distributed Indexes & Queries
Versatile aggregations
Jupyter/IPython
Great community support
Data Scientists familiar with Python
![Page 37: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/37.jpg)
..
Challenges & Opportunities
![Page 38: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/38.jpg)
• What’s the best model for integrating vast amounts of disparate kinds
of information over space and time?
• What’s the best way of building spatiotemporal models in a fashion that
is effective, elegant, and debuggable?
• About a 100 or so more … :-)
ML ProblemsChallenges
![Page 39: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/39.jpg)
LinksThank you!
• Realtime Streaming at Uber
https://www.infoq.com/presentations/real-
time-streaming-uber
• Spark at Uber
(http://www.slideshare.net/databricks/spark-
meetup-at-uber)
• Career at Uber
(https://www.uber.com/careers/)
•https://join.uber.com/marketplace
![Page 40: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/40.jpg)
Happy to discuss design/architecture
Q & A
No product/business questions please :-)
@stonse
![Page 41: ML and Data Science at Uber](https://reader031.vdocuments.net/reader031/viewer/2022030318/5a64acb37f8b9a94568b5133/html5/thumbnails/41.jpg)
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced
or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information
storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the
individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from
disclosure under applicable law. All recipients of this document are notified that the information contained herein includes
proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this
document or any of the enclosed information to any person other than employees of addressee to the extent necessary for
consultations with authorized personnel of Uber.
Sudhir Tonse
@stonse
Thank you