ml and data science at uber - gitpro talk 2017
TRANSCRIPT
Introduction Problem Space Tools of the Trade
Challenges likely unique to Uber .. interesting opportunities
Challenges & Opportunities
Who am I and what are we talking about today?
Why does Uber need ML and what are some of the problems we tackle?
What does Uber’s tech stack look like?
AgendaHop on the Uber ML Ride … destination please?
•Engineering Leader @ Uber•Marketplace Data
•Realtime Data Processing•Analytics•Forecasting
• Previous -> MicroServices/Cloud Platform at Netflix
•Twitter @stonse
5
Who am I?
Driver Partner Riders Merchants
Uber’s logistic platform
Marketplace
Our partner in the ride sharing business
Folks like you and me who request a ride on any of Uber’s transportation products. e.g. UberX, uberPool
Restaurants or shops that have signed on to the Uber platform.
IntroductionUber
• Mapping (Routes, ETAs, …)
• Fraud and Security
• uberEATS Recommendations
• Marketplace Optimizations
• Forecasting
• Driver Positioning
• Health, Trends, Issues, ...
• And more …
ML ProblemsWhy do we need Machine Learning?
ETA, Route Optimization, Pickup Points, Pool rider matches
Marketplace
Build the platform, products, and algorithms responsible for the real time execution and online optimization of Uber's marketplace.
We are building the brain of Uber, solving NP-hard algorithms and economic optimization problems at scale.
Uber | MarketplaceMission
Request Event
Driver Accept Event
Trip Started Event
more events …
Overall Flow
Ma
t
c
h
Se
r
v
ices
• Indexing, Lookup, Rendering
• Symmetric Neighbors
• Convex & Compact Regions
• Equal Areas
• Equal Shape
Space -> Hexagons
Spatial granularity & Multiresolution Forecasting
The more you aggregate or zoom out, trends emerge
Sparsity at hexagon level: many hexagons have little signal
1. Forecast at the hex-cluster level
2. Using past activity for a similar time window, apportion out total activity from the hex-cluster to its component hexagons
Multiresolution ForecastingForecasting at different spatial granularity
Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3)
Airport Demand (ETR)
Mean Delay ~30 minutes
Half Life~ 1.0 minute
“ETR too much. I bail out
..”
Solution: Time Meter Banner
“Only about 20 minutes. I would
wait!”
20 minutes wait to get a $40 trip, oh yeah!
Data Science Flow A Typical Data Scientist Workflow
Analyze/Prepare Feature SelectionModel FittingEvaluationStorage Apply Model and serve
predictionsEvaluate Runtime
Performance
Serving/DisseminationMonitoring
Data exploration, cleansing, transformations etc.
Evaluate strength of various signals Use Python/R etc. to fit
Model.Evaluate Model
PerformanceStore Model with
versioning
Data Preparation A Typical Data Scientist Workflow
Analyze/Prepare
Data exploration, cleansing, transformations etc.
Feature SelectionModel FittingEvaluationStorage Apply Model and serve
predictionsEvaluate Runtime
Performance
Serving/DisseminationMonitoring
Evaluate strength of various signals Use Python/R etc. to fit
Model.Evaluate Model
PerformanceStore Model with
versioning
Data Science Flow A Typical Data Scientist Workflow
Feature SelectionModel FittingEvaluationStorageEvaluate strength of
various signals Use Python/R etc. to fit Model.
Evaluate Model Performance
Store Model with versioning
Data Science Flow A Typical Data Scientist Workflow
Analyze/Prepare Feature SelectionModel FittingEvaluationStorage Apply Model and serve
predictionsEvaluate Runtime
Performance
Serving/DisseminationMonitoring
Data exploration, cleansing, transformations etc.
Evaluate strength of various signals Use Python/R etc. to fit
Model.Evaluate Model
PerformanceStore Model with
versioning
Overview
Streamline the forecasting process
from conception to production
• Streams w/ flexible
geo-temporal resolution
• Valuable external data feeds
• Modular, reusable
components at each stage
• Same code for offline
model fitting and
production to enable fast
model iteration
Operators & Computation DAGs
Feature Generation
Online ModelsOffline Model Fitting
Predictions, Metrics & Visualizations
External Data Streams
Airport feed
Weather feed
Concerts feed
Realtime Models
- Something happened at a time and a place. Now we will
Evaluate the DAG
- DAG evaluated for a single instant in time
real-time spatiotemporal forecasting at a variable resolution of time and space
• Curated set of algorithms• Model Versioning• Model Performance & Visualizations• Automated Deployment Workflow• …
Machine Learning as a ServiceML workflow at Uber
Open Source TechnologiesSub-title
Samza
Micro Batch based processingGood integration with HDFS & S3Exactly once semantics
Spark Streaming
Well integrated with KafkaBuilt in State ManagementBuilt in Checkpointing
Distributed Indexes & QueriesVersatile aggregations
Jupyter/IPython
Great community supportData Scientists familiar with Python
• What’s the best model for integrating vast amounts of disparate kinds of information over space and time?
• What’s the best way of building spatiotemporal models in a fashion that is effective, elegant, and debuggable?
• About a 100 or so more … :-)
ML ProblemsChallenges
LinksThank you!
• Realtime Streaming at Uberhttps://www.infoq.com/presentations/real-time-streaming-uber
• Spark at Uber (http://www.slideshare.net/databricks/spark-
meetup-at-uber)• Career at Uber (https://www.uber.com/careers/)•https://join.uber.com/marketplace
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be
reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any
information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the
use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise
exempt from disclosure under applicable law. All recipients of this document are notified that the information contained
herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any
way disclose this document or any of the enclosed information to any person other than employees of addressee to the
extent necessary for consultations with authorized personnel of Uber.
Sudhir Tonse
@stonse
Thank you