1 cs 294-42: project suggestions ion stoica (istoica/classes/cs294/11/) september 14, 2011

29
1 CS 294-42: Project Suggestions Ion Stoica (http://www.cs.berkeley.edu/~istoica/ classes/cs294/11/) September 14, 2011

Upload: gyles-french

Post on 11-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

1

CS 294-42: Project Suggestions

Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/)

September 14, 2011

Page 2: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Projects

This is a project oriented class Reading papers should be means to a great

project not a goal in itself! Strongly prefer groups of two

Perfectly fine to have the same project at cs262 Today, I’ll present some suggestions

But, you are free to come up with your own proposal

Main goal: just do a great project2

Page 3: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Where I’m Coming From?

Key challenge: maximize economic value of data, i.e., Extract value from data while reducing costs (e.g.,

storage, computation)

3

Page 4: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Where I’m Coming From? Tools to extract value from big-data

Scalability Response time Accuracy

Provide high cluster utilization for heterogeneous workloads Support diverse SLAs Predictable performance Isolation Consistency 4

Page 5: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Caveats Cloud computing is HOT, but lot of NOISE!

Not easy to differentiate between narrow engineering solutions

and fundamental tradeoffs predict the importance of the problem you solve

Cloud computing it’s akin Gold Rush!

5

Page 6: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Background: Mesos Rapid innovation in cloud computing

No single framework optimal for all applications Running each framework on its dedicated cluster

Expensive Hard to share data

6

Dryad

Pregel

CassandraHypertable

Need to run multiple frameworks on same clusterNeed to run multiple frameworks on same cluster

Page 7: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Background: Mesos – Where We Want to Go

HadoopHadoop

PregelPregel

MPIMPIShared cluster

Today: static partitioning Mesos: dynamic sharinguniprogramming multiprogramming

Page 8: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Background: Mesos – Solution

Mesos is a common resource sharing layer over which diverse frameworks can run

8

NodeNode NodeNode

HadoopHadoop

NodeNode NodeNode

MPIMPI…MesosMesos

Page 9: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Background: Workload in Datacenters

Frontend (Web-servers, dabses)

Decision-driven processes

Exploratory queries (e.g., Dremel)

Production jobs (e.g., compute summaries)

Analytics jobs

9

High Low

Interactive(low-latency)

Batch

Priority

Response

Page 10: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Datacenter OS: Resource Management, Scheduling

10

Page 11: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Hierarchical Scheduler (for Mesos)

Allow administrators to organize into groups Provide resource guarantees per group Share available resources (fairly) across groups

Research questions Abstraction (when using multiple resources)? How to implement using resource offers? What policies are compatible at different levels in the

hierarchy?

11

Page 12: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Cross Application Resource Management

An app uses many services (e.g., file systems, key-value storage, databases, etc)

If an app has high priority and the service it uses doesn’t, the app SLA (Service Level Agreement) might be violated

Research questions Abstraction, e.g., resource delegation, priority

propagation? Clean-slate mechanisms vs. incremental deployability This is also highly challenging in single node OSes!

12

Page 13: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Resource Management using VMs

Most cluster resource managers use Linux containers (e.g., Mesos) Thus, schedulers assume no task migration

Research questions: Develop scheduler for VM environments (e.g., extend

DRF) Tradeoffs between migration, delay, and preemption

13

Page 14: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Task Granularity Selection (Yanpei Chen)

Problem: number of tasks per stage in today’s MapRed apps (highly) sub-optimal

Research question: Derive algorithms to pick the number of tasks to

optimize various performance metrics, e.g., utilization, response time, network traffic

subject to various constraints, e.g., capacity, network

14

Page 15: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Resource Revocation

Which task we should revoke/preempt? Two questions

Which slot has least impact on the giving framework? Is the slot acceptable to receiving framework?

Research questions Identify feasible slot for receiving framework with least

impact on giving framework Light-weight protocol design

15

Page 16: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Control Plane Consistency Model What type of consistency is “good-enough” for

various control plane functions File system metadata (Hadoop) Routing (Nicira) Scheduling Coordinated caching …

Research question What are trade-off between performance and

consistency? Develop generic framework for control plane

16

Page 17: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Decentralized vs. Centralized Scheduling Decentralized schedulers

E.g., Mesos, Hadoop 2.0 Delegate decision to apps (i.e., frameworks, jobs) Advantages: scale and separation of concerns (i.e., apps know

the best where and which tasks to run) Centralized schedulers

Knows all app requirements Advantages: optimal

Research challenge: Evaluate centralized vs. decentralized schedulers Characterize class of workloads for which decentralized

scheduler is good enough

17

Page 18: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Opportunistic Scheduling

Goal: schedule interactive jobs (e.g., <100ms latency)

Existing schedulers: high overhead (e.g., Mesos needs to decide on every offer)

Research challenge: Tradeoff between utilization and response time Evaluate hybrid approach

18

Page 19: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Background: Dominant Resource Fairness

Implement fair (proportional) allocation for multiple types of resources

Key properties Strategy proof: users cannot get an advantage by

lying about their demands Sharing incentives: users are incentivized to share a

cluster rather than partitioning it

19

Page 20: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

DRF for Non-linear Resources/Demands

DRF assume resources & demands are additive E.g., task 1 needs (1CPU, 1GB) and task 2 needs

(1CPU, 3GB) both tasks need (2CPU, 4GB) Sometime demands are non-linear

E.g., shared memory Sometime resources are non-linear

E.g., disk throughput, caches Research challenge:

DRF-like scheduler for non-linear resources & demands (could be two projects here!)

20

Page 21: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

DRF for OSes

DRF designed for clusters using resource offer mechanism

Redesign DRF to support multi-core OSes

Research questions: Is resource offer best abstraction? How to best leverage preemption? (in Mesos tasks

are not preempted by default) How to support gang scheduling?

21

Page 22: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Storage & Data Processing

22

Page 23: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Resource Isolation for Storage Services

Share storage (e.g., key-value store) between Frontend, e.g., web services Backend, e.g., analytics on freshest data

Research challenge Isolation mechanism: protect front-end performance

from back-end workload

23

Page 24: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

“Quicksilver” DB Goal: interactive queries with bounded error on

“unbounded” data Trade between efficiency and accuracy Query response time target: < 100ms

Approach: random pre-sampling across different dimensions (columns)

Research question: given a query and an error bound, find Smallest sample to compute result Sample minimizing disk (or memory) access times (Talk with Sameer, if interested)

24

Page 25: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Split-Privacy DB (1/2)

25

Partition data & computation Private Public (stored on cloud)

Goal: use cloud without revealing the computation result

Example: Operation f(x, y) = x + y, where

x: private y: public

Pick random number a, and compute x’ = x + a compute f(x’, y) = r’ = x’ + y recover result: r = r’ – a = (x’ – a) + y = x + y

Private DB Public DB

fprivate fpublic

result

Page 26: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Split-Privacy DB (2/2)

26

Partition data & computation Private Public (stored on cloud) Example: patient data (private), public clinical and

genomics data sets Goal: use cloud without revealing the

computation result Research questions:

What types of computation can be implemented? Any more powerful than privacy-preserving

computation / Data Mining?

Private DB Public DB

fprivate fpublic

result

Page 27: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

RDDs as an OS Abstraction Resilient Data Sets (RDDs)

Fault-tolerant (in-memory) parallel data structures Allows Spark apps to efficiently reuse data

Design cross-application RDDs Research questions

RDD reconstruction (track software and platform changes)

Enable users to share intermediate results of queries (identify when two apps compute same RDD)

RDD cluster-wide caching

27

Page 28: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Provenance-based Efficient Storage (Peter B and Patrick W)

Reduce storage by deleting data that can be recreated Generalization of previous project

Research challenges: Identify data that can deterministically recreated and the

code to do so Use hints?

Tradeoff between re-creation and storage May take into account access patter, frequency, performance

28

Page 29: 1 CS 294-42: Project Suggestions Ion Stoica (istoica/classes/cs294/11/) September 14, 2011

Very-low Latency Streaming

Challenge: straglers, failures Approaches to reduce latency:

Redundant computations Speculative execution

Research questions Theoretical trade-off between response time and

accuracy? Achieve target latency and accuracy, while minimizing

the overhead 29