riselab:enabling intelligent real-time decisions

33
RISELab: Enabling Intelligent Real-time Decisions Ion Stoica February 8, 2017

Upload: jen-aman

Post on 12-Apr-2017

64 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: RISELab:Enabling Intelligent Real-Time Decisions

RISELab: Enabling Intelligent Real-time Decisions

Ion StoicaFebruary 8, 2017

Page 2: RISELab:Enabling Intelligent Real-Time Decisions

Berkeley’s AMPLab (2011-2016)

2

Algorithms

Machines People

Goal: Next generation of open sourcedata analytics stack for industry & academia

Berkeley Data Analytics Stack (BDAS)

Page 3: RISELab:Enabling Intelligent Real-Time Decisions

Berkeley’s AMPLab (2011-2016)

3

Algorithms

Machines People

Goal: Next generation of open sourcedata analytics stack for industry & academia

Berkeley Data Analytics Stack (BDAS)

Page 4: RISELab:Enabling Intelligent Real-Time Decisions

RISE: Real-time Intelligent Secure Execution

Page 5: RISELab:Enabling Intelligent Real-Time Decisions

From batch data to advanced analytics

AMPLab

5

From live data to real-time decisions

RISELab

Page 6: RISELab:Enabling Intelligent Real-Time Decisions

RISE Lab (2017-2022)

12 faculty across AI, systems, security, and architectures

11 Founding sponsors

6

Page 7: RISELab:Enabling Intelligent Real-Time Decisions

Why?Data only as valuable as the decisions it enables

7

Page 8: RISELab:Enabling Intelligent Real-Time Decisions

Why?

What does this mean?• Faster decisions better than slower decisions• Decisions on fresh data better than decisions on stale data• Decisions on personalized data better than on aggregate data

8

Data only as valuable as the decisions it enables

Page 9: RISELab:Enabling Intelligent Real-Time Decisions

Goal

Real-time decisions

on live data

with strong security

9

decide in ms

the current state of the environment

privacy, confidentiality, integrity

Page 10: RISELab:Enabling Intelligent Real-Time Decisions

Typical decision system

Decision SystemQuery

Decision

Environment+

sensors & actuators

Observations, Feedback

Preprocess Intermediatedata

DecisionEngine

Page 11: RISELab:Enabling Intelligent Real-Time Decisions

Decision

QueryDecisionEnginePreprocess Intermediate

data

Environment+

sensors & actuators

Typical decision system

Decision System

Observations, Feedback

LiveUpdate latency(e.g., ~1 seconds)

Page 12: RISELab:Enabling Intelligent Real-Time Decisions

Decision

QueryDecisionEnginePreprocess Intermediate

data

Environment+

sensors & actuators

Typical decision system

Decision System

Observations, Feedback

LiveUpdate latency(e.g., ~1 seconds)

Secure

Real-timedecision latency

(e.g., ~10 ms)

Page 13: RISELab:Enabling Intelligent Real-Time Decisions

Example of decision systems

Decision SystemQuery

DecisionTraining

Models(diff. tradeoffs

complexity/accuracy)

ModelServing

FeedbackObservations, Feedback

ML Pipeline(e.g., Clipper +

Spark/Tensorflow)

Decision SystemObs.

Action

Update Policy

Policyobs àaction

QueryPolicy

Observations, Rewards

ReinforcementLearning Systems

(e.g., Ray)

Pre-process

Interm-ediateData

DecisionEngine

Page 14: RISELab:Enabling Intelligent Real-Time Decisions

What else do we want from decisions?

Intelligent: complex decisions in uncertain environments

Robust: handle complex noise, unforeseen inputs, failures

Explainable: ability to explain non-obvious decisions

Page 15: RISELab:Enabling Intelligent Real-Time Decisions

Goal

Develop open source platforms, tools, and algorithms for

intelligent real-time decisions on live-data

Page 16: RISELab:Enabling Intelligent Real-Time Decisions

Some Proposed Research

Secure Real-time Decisions Stack (SRDS) • Open source platform to develop of RISE apps• Secure from ground up• Reinforcement Learning (RL) as one of key app patterns

Learning control hierarchies: speedup learning, training

Shared learning: learn over confidential data

16

Page 17: RISELab:Enabling Intelligent Real-Time Decisions

Secure Real-time Decisions Stack (SRDS) • Open source platform to develop of RISE apps• Secure from ground up• Reinforcement Learning (RL) as one of key app patterns

Learning control hierarchies: speedup learning, training

Shared learning: learn over confidential data

17

Some Proposed Research

Page 18: RISELab:Enabling Intelligent Real-Time Decisions

Secure Real-time Decision Stack (SRDS)

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Page 19: RISELab:Enabling Intelligent Real-Time Decisions

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Minimalist execution engine:• Support both data flow and task-parallel execution models• High-throughput, low-latency: ~ 1M tasks/sec @ ms latency

Secure Real-time Decision Stack (SRDS)

Page 20: RISELab:Enabling Intelligent Real-Time Decisions

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Central repository for models, APIs to capture the context in which data gets used and producedStatus: ongoing project with industry partners

Secure Real-time Decision Stack (SRDS)

Page 21: RISELab:Enabling Intelligent Real-Time Decisions

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Replaying of apps at fine granularity• Simplify development, debugging• Robustness: replay against perturbed inputs• Explainability: identify inputs causing decision• Security: confirm vulnerabilities, test security

patches, compliance auditing

Secure Real-time Decision Stack (SRDS)

Page 22: RISELab:Enabling Intelligent Real-Time Decisions

scheduler object store

RISE μkernel

Ray Clipper …

Ground (data context service)

Tim

e M

achi

ne

Dramatically simplify development of RISE applications• Apache Spark: improve latency and security• Clipper: model serving for Apache Spark, Scikit learn, etc• Ray: framework for RL applications

Secure Real-time Decision Stack (SRDS)

Page 23: RISELab:Enabling Intelligent Real-Time Decisions

Improving Apache SparkDrizzle• Decrease latency of Structured Streaming and ML algorithms by ~10x• Techniques: group scheduling, shared variables

23

Page 24: RISELab:Enabling Intelligent Real-Time Decisions

Streaming Latency: YCSB benchmark

24

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Page 25: RISELab:Enabling Intelligent Real-Time Decisions

Streaming Latency: YCSB benchmark

25

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Page 26: RISELab:Enabling Intelligent Real-Time Decisions

Streaming Latency: YCSB benchmark

26

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Page 27: RISELab:Enabling Intelligent Real-Time Decisions

Streaming Latency: YCSB benchmark

27

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

Page 28: RISELab:Enabling Intelligent Real-Time Decisions

Streaming Latency: YCSB benchmark

28

0

500

1000

1500

2000

1 2 4 8 12 16 20 24

Med

ian

Even

t Lat

ency

(m

s)

Throughput (Million events/s)

Spark Drizzle Flink Drizzle-Opt

Drizzle-Opt: Reduce-by on mapper side

15x

Page 29: RISELab:Enabling Intelligent Real-Time Decisions

MLlib: SGD Performance

29

0

20

40

60

4 8 16 32 64 128

Tim

e / i

ter(

ms)

Machines

Spark Drizzle

Page 30: RISELab:Enabling Intelligent Real-Time Decisions

MLlib: SGD Performance

30

0

20

40

60

4 8 16 32 64 128

Tim

e / i

ter (

ms)

Machines

Spark Drizzle

6x

Page 31: RISELab:Enabling Intelligent Real-Time Decisions

Improving Apache SparkDrizzle• Decrease latency of Structured Streaming and ML algorithms by ~10x• Techniques: group scheduling, shared variables• Some of these techniques will make their way to Apache Spark

Opaque• Full data encryption, authentication, and verification (Intel’s SGX)• Oblivious mode: hide data access pattern• Support most SparkSQL functionality• See Wenting’s talk later

31

Page 32: RISELab:Enabling Intelligent Real-Time Decisions

RISELab

Already promising results

Expect much more over the next five years!

32

Goal: Develop open source platforms, tools, and algorithms for intelligent real-time decisions on live-

data

Page 33: RISELab:Enabling Intelligent Real-Time Decisions

Thank you