ml and data science at uber

41
ML and Data Science at Uber Sudhir Tonse, Engineering Lead Marketplace, Uber FEB 18, 2017

Upload: young-kwon

Post on 21-Jan-2018

106 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: ML and Data Science at Uber

ML and Data Science at UberSudhir Tonse, Engineering Lead

Marketplace, Uber

FEB 18,

2017

Page 2: ML and Data Science at Uber

Where do we want to go today?

Agenda

Page 3: ML and Data Science at Uber

Introduction Problem Space Tools of the Trade

Challenges likely unique to

Uber .. interesting

opportunities

Challenges &

Opportunities

Who am I and what are we

talking about today?

Why does Uber need ML

and what are some of the

problems we tackle?

What does Uber’s tech

stack look like?

AgendaHop on the Uber ML Ride … destination please?

Page 4: ML and Data Science at Uber

Uber, this talk and me the speaker

Introduction

Page 5: ML and Data Science at Uber

• Engineering Leader @ Uber

• Marketplace Data

• Realtime Data Processing

• Analytics

• Forecasting

• Previous -> MicroServices/Cloud Platform at

Netflix

• Twitter @stonse

5

Who am I?

Page 6: ML and Data Science at Uber

Driver Partner Riders Merchants

Uber’s logistic platform

Marketplace

Our partner in the ride

sharing business

Folks like you and me who

request a ride on any of

Uber’s transportation

products. e.g. UberX,

uberPool

Restaurants or shops that

have signed on to the

Uber platform.

IntroductionUber

Page 7: ML and Data Science at Uber

“Transportation as reliable as

running water, everywhere, for

everyone”

Uber Mission

Page 8: ML and Data Science at Uber

• Mapping (Routes, ETAs, …)

• Fraud and Security

• uberEATS Recommendations

• Marketplace Optimizations

• Forecasting

• Driver Positioning

• Health, Trends, Issues, ...

• And more …

ML ProblemsWhy do we need Machine Learning?

ETA, Route Optimization,

Pickup Points, Pool rider

matches

Page 9: ML and Data Science at Uber

Marketplace

Build the platform, products, and algorithms

responsible for the real time execution and online

optimization of Uber's marketplace.

We are building the brain of Uber, solving NP-hard

algorithms and economic optimization problems at

scale.

Uber | MarketplaceMission

Page 10: ML and Data Science at Uber

Request Event

Driver Accept

EventTrip Started

Event

more events

Overall Flow

M

a

t

c

h

S

e

r

v

i

c

e

s

Page 11: ML and Data Science at Uber

Trip StatesSub-title

Events - for each action/state

Rider States Driver States

Page 12: ML and Data Science at Uber

Scale

~400 Cities

Many Billion Events per Day

Page 13: ML and Data Science at Uber

Scale

Geo

Space

Vehicle

TypesTime

Page 14: ML and Data Science at Uber

Space -> Hexagons

Page 15: ML and Data Science at Uber

Granular Data

Page 16: ML and Data Science at Uber

Scale ..For a fine grained OLAP system

1 day of data:

~400 (cities) x 10,000 (avg number of hexagons

per city) x 7 (Vehicle types) x 1440 (minutes per

day) x 13 (Trip States)

524 billion possible combinations

Page 17: ML and Data Science at Uber

OLAP Queries on Big Data

Realtime + Batch processing

Page 18: ML and Data Science at Uber

Data Processing

HDFS

Page 19: ML and Data Science at Uber

Multi-resolution Realtime Forecasting, Airport ETR

ML Examples

Page 20: ML and Data Science at Uber

Real-time spatiotemporal

forecasting at a variable

resolution of time and spaceExample 1

Page 21: ML and Data Science at Uber

Rider Demand ForecastingPredict #of Riders per hexagon for various time horizons

Page 22: ML and Data Science at Uber

Spatial granularity & Multiresolution Forecasting

Some small challenges

The more you aggregate

or zoom out, trends

emerge

Sparsity at hexagon level:

many hexagons have little

signal

Page 23: ML and Data Science at Uber

1. Forecast at the hex-cluster level

2. Using past activity for a similar time window,

apportion out total activity from the hex-

cluster to its component hexagons

Multiresolution ForecastingForecasting at different spatial granularity

Page 24: ML and Data Science at Uber

Airport ETR

ML Example No 2.

Airport Taxi Line Uber Airport Lot

Page 25: ML and Data Science at Uber

Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3)

Airport Demand (ETR)

Mean Delay

~30 minutes

Half Life

~ 1.0 minute

Page 26: ML and Data Science at Uber

“ETR too

much. I bail

out ..”

Solution: Time Meter Banner

“Only about 20

minutes. I would

wait!”

20 minutes wait to get a

$40 trip, oh yeah!

Page 27: ML and Data Science at Uber

Data Science FlowA Typical Data Scientist Workflow

Analyze/Prepare Feature SelectionModel Fitting

Evaluation

Storage Apply Model and serve

predictions

Evaluate Runtime

Performance

Serving/Dissemination

Monitoring

Data exploration,

cleansing,

transformations etc.

Evaluate strength of

various signalsUse Python/R etc. to fit

Model.

Evaluate Model

Performance

Store Model with

versioning

Page 28: ML and Data Science at Uber

Data PreparationA Typical Data Scientist Workflow

Analyze/Prepare

Data exploration,

cleansing,

transformations etc.

Feature SelectionModel Fitting

Evaluation

Storage Apply Model and serve

predictions

Evaluate Runtime

Performance

Serving/Dissemination

MonitoringEvaluate strength of

various signalsUse Python/R etc. to fit

Model.

Evaluate Model

Performance

Store Model with

versioning

Page 29: ML and Data Science at Uber

Data Science FlowA Typical Data Scientist Workflow

Feature SelectionModel Fitting

Evaluation

StorageEvaluate strength of

various signalsUse Python/R etc. to fit

Model.

Evaluate Model

Performance

Store Model with

versioning

Page 30: ML and Data Science at Uber

Data Scientists (Analytics)

Page 31: ML and Data Science at Uber

Data Science FlowA Typical Data Scientist Workflow

Analyze/Prepare Feature SelectionModel Fitting

Evaluation

Storage Apply Model and serve

predictions

Evaluate Runtime

Performance

Serving/Dissemination

Monitoring

Data exploration,

cleansing,

transformations etc.

Evaluate strength of

various signalsUse Python/R etc. to fit

Model.

Evaluate Model

Performance

Store Model with

versioning

Page 32: ML and Data Science at Uber

Overview

Streamline the forecasting process

from conception to production

• Streams w/ flexible geo-

temporal resolution

• Valuable external data feeds

• Modular, reusable

components at each stage

• Same code for offline

model fitting and

production to enable fast

model iteration

Operators & Computation DAGs

Feature Generation

Online ModelsOffline Model Fitting

Predictions, Metrics & Visualizations

External Data Streams

Airport feed

Weather feed

Concerts feed

Page 33: ML and Data Science at Uber

Realtime Models

- Something happened at a time and a

place. Now we will

Evaluate the DAG

- DAG evaluated for a single instant in time

real-time spatiotemporal forecasting at a variable resolution of time and space

Page 34: ML and Data Science at Uber

Under the hood ..

Tools & Framework

Page 35: ML and Data Science at Uber

• Curated set of algorithms

• Model Versioning

• Model Performance & Visualizations

• Automated Deployment Workflow

• …

Machine Learning as a ServiceML workflow at Uber

Page 36: ML and Data Science at Uber

Open Source TechnologiesSub-title

Samza

Micro Batch based processing

Good integration with HDFS & S3

Exactly once semantics

Spark Streaming

Well integrated with Kafka

Built in State Management

Built in Checkpointing

Distributed Indexes & Queries

Versatile aggregations

Jupyter/IPython

Great community support

Data Scientists familiar with Python

Page 37: ML and Data Science at Uber

..

Challenges & Opportunities

Page 38: ML and Data Science at Uber

• What’s the best model for integrating vast amounts of disparate kinds

of information over space and time?

• What’s the best way of building spatiotemporal models in a fashion that

is effective, elegant, and debuggable?

• About a 100 or so more … :-)

ML ProblemsChallenges

Page 39: ML and Data Science at Uber

LinksThank you!

• Realtime Streaming at Uber

https://www.infoq.com/presentations/real-

time-streaming-uber

• Spark at Uber

(http://www.slideshare.net/databricks/spark-

meetup-at-uber)

• Career at Uber

(https://www.uber.com/careers/)

•https://join.uber.com/marketplace

Page 40: ML and Data Science at Uber

Happy to discuss design/architecture

Q & A

No product/business questions please :-)

@stonse

Page 41: ML and Data Science at Uber

Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced

or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information

storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the

individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from

disclosure under applicable law. All recipients of this document are notified that the information contained herein includes

proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this

document or any of the enclosed information to any person other than employees of addressee to the extent necessary for

consultations with authorized personnel of Uber.

Sudhir Tonse

@stonse

Thank you