reef: the retainable evaluator execution framework•recursion is built into the language •...

58

Upload: others

Post on 22-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 2: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

“ ”

2

Page 3: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

3

Learning

Supervised • Classification • Regression • Recommender Unsupervised • Clustering • Dimensionality

reduction • Topic modeling

Data

Big: TiB - PiB

Model

Small: MiB - GiB

Page 4: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Modeling Evaluation Example Formation Examples Model

Page 5: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Example

EMail

Click Log

Bag of Words ID

Label ID

Bag of Words Label ID

Feature Extraction

Label Extraction

Data Parallel Functions

Large Scale Join

(Large Scale) Join

Page 6: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Step II: Modeling

Step III: Evaluation

Modeling Evaluation Example Formation

Page 7: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Step I: Example Formation Feature and Label Extraction

Step III: Evaluation

Modeling Evaluation Example Formation

Page 8: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Apply Model to Data

Observe Errors

Update Model

Page 9: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Sample Features

Copy Model

Modeling Evaluation Example Formation

Page 10: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

+ MapReduce model fits statistical query model

learning - Hadoop MR does not support iterations (30x

slowdown compared to others) - Hadoop MR does not match other forms of

algorithms

Hadoop Abuse

Page 11: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Statistics

Model Updates

Page 12: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Statistics / Updates

Page 13: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 14: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Rise of the Resource Managers

Page 15: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Map Task

Reduce Task

Map Task

Reduce Task

Map Task

Page 16: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 17: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

App Master

Resource Allocation = list of (node type, count, resource) E.g. { (node1, 1, 1GB), (rack-1, 2, 1GB),(*, 1, 2GB) }

Page 18: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

App Master Container

Container Container

Container

Page 19: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

App Master Container

Container Container

Container

Page 20: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

App Master Container

Container Container

Container

Page 21: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 22: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 23: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

REEF: Retainable Evaluator Execution Framework

Page 24: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 25: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

YARN / HDFS

SQL / Hive … … Machine Learning

Page 26: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

SQL / Hive

YARN / HDFS

… … Machine Learning

REEF

Page 27: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

YARN / HDFS

SQL / Hive … … Machine Learning

REEF

Physical Data Parallel Operators

Logical Abstraction

Page 28: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Storage

Network

State Management

Job Driver Control plane implementation. User code executed on YARN’s Application Master

Activity User code executed within an Evaluator.

Evaluator Execution Environment for Activities. One Evaluator is bound to one YARN Container.

Page 29: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 30: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 31: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Client

public class DistributedShell { ... public static void main(String[] args){ ... Injector i = new Injector(yarnConfiguration); ... REEF reef = i.getInstance(REEF.class); ... reef.submit(driverConf); } }

Page 32: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

public class DistributedShell { ... public static void main(String[] args){ ... Injector i = new Injector(yarnConfiguration); ... REEF reef = i.getInstance(REEF.class); ... reef.submit(driverConf); } }

Client

Page 33: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

public class DistributedShellJobDriver { private final EvaluatorRequestor requestor; ... public void onNext(StartTime time) { requestor.submit(EvaluatorRequest.Builder() .setSize(SMALL).setNumber(2) .build() ); } ... }

Client

Page 34: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Client

public class DistributedShellJobDriver { private final EvaluatorRequestor requestor; ... public void onNext(AllocatedEvaluator eval) { Configuration contextConf = ...; eval.submitContext(contextConf) } ... }

Page 35: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

context config +

Client

Page 36: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Client

Page 37: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

public class DistributedShellJobDriver { private final String cmd = “ls”; [...] public void onNext(ActiveContext ctx) { final String activityId = [...]; Configuration activityConf = Activity.CONF .set(IDENTIFIER, "ShellActivity") .set(ACTIVITY, ShellActivity.class) .set(COMMAND, this.cmd) .build(); ctx.submitActivity(activityConf); } [...] }

activity config

Client

Page 38: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

class ShellActivity implements Activity { private final String command; @Inject ShellActivity(@Parameter(Command.class) String c) { this.command = c; } private String exec(final String command){ ... } @Override public byte[] call(byte[] memento) { String s = exec(this.cmd); return s.getBytes(); } }

Client

Page 39: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Client

Page 40: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Retains State!

Client

Page 41: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

activity config

Client

Page 42: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Client

Page 43: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Client

Page 44: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 45: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 46: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Name Node

Yarn RM

HDFS NM

REEF

HDFS NM

HDFS NM

Job Driver

Activity

Client

node1

node2

node3

node4

Page 47: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Name Node

Yarn RM

HDFS NM

REEF

HDFS NM

HDFS NM

Job Driver

Activity

Client

node1

node2

node3

node4

Page 48: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Name Node

Yarn RM

HDFS NM

REEF

HDFS NM

HDFS NM

Job Driver

Client

node1

node2

node3

node4

Activity

activity config +

Page 49: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 50: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 51: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and
Page 52: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

52

Select, Project, Join, Group MapReduce MPI

Logical Layer

Physical Layer

ML algorithm

Graph Analysis SQM

Page 53: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

53

MapReduce MPI

Logical Layer

Physical Layer

ML algorithm

Graph Analysis SQM

Select, Project, Join, Group

Page 54: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

REEF

Parallel Recursive Dataflow

Query optimizer

54

Logical query over training data

ML algorithm Graph

Analysis SQM

Page 55: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

• Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and N. Filardo. Dyna: Extending datalog for

modern AI. In Datalog ’10 S. Funiak et al. Distributed inference with declarative

overlay networks. EECS Tech Report 2008 D. Deutch, C. Koch, T. Milo. On Probabilistic Fixpoint and

Markov Chain Query Languages. In PODS ’10 Y. Bu et al. Scaling Datalog for Machine Learning on Big

Data. Tech Report http://arxiv.org/abs/1203.0160 2012 55

REEF

Parallel Recursive Dataflow

Query optimizer

Datalog query over training data

ML algorithm Graph

Analysis SQM

Page 56: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

• Implementation over Hyracks • Supports both Iterative-MRU and Pregel • Standard optimizations + some new tricks

56

Hyracks

Hardcoded optimizations

Datalog queries

Programming Models for ML

algorithms

Pregel Iterative-MRU

REEF

Page 57: REEF: The Retainable Evaluator Execution Framework•Recursion is built into the language • Amenable to optimizations • Lots of existing work that we can leverage J. Eisner and

Parallel Recursive Dataflow

57

REEF

Query optimizer

Datalog query over training data

ML algorithm Graph

Analysis SQM

• Storage/Networking services

• State Management • Caching policies

• Dynamic resources • Elastic operators

• Cost estimation for recursive computation

• Cost models (time vs money)

• Interactive Query Processing

• Provenance for triage • “My model misbehaves

- why?” • Fault-awareness policies • Incremental learning