amp camp 5 intro

18
Welcome and AMPLab Overview UC BERKELEY Michael Franklin November 20, 2014

Upload: jeykottalam

Post on 14-Jul-2015

700 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: AMP Camp 5 Intro

Welcome and

AMPLab Overview

UC BERKELEY

Michael Franklin

November 20, 2014

Page 2: AMP Camp 5 Intro
Page 3: AMP Camp 5 Intro

3

Page 4: AMP Camp 5 Intro

AMPLab Overview

Project Launched Jan 2011, 6 Yr Planned Duration

Personnel: ~65 Students, Postdocs, Faculty and Staff

Funding: Government/Industry Partnership NSF Expedition Award , Darpa XData, DoE, 20+

Companies

Key Outputs:

BDAS Open Source Stack & Apps, (including Apache

Spark)

Publications: Top Venues in ML, Systems, Databases and

Others

Graduates in High Demand in Academia and Industry

“… the University of California, Berkeley’s AMPLab

has already left an indelible mark on world of

information technology, and even the web. But we

haven’t yet experienced the full impact of the group,

… Not even close.”

-- Derrick Harris, GigaOm, August 2014

Page 5: AMP Camp 5 Intro

The AMPLab Faculty UC BERKELEY

Michael Franklin (Databases)

Michael Jordan (Machine Learning)

Ion Stoica (Systems)

Dave Patterson (Systems)

Scott Shenker (Networks)

Alex Bayen (Mobile Sensing)

David Culler (Systems/Sensing)

Ken Goldberg (Crowdsourcing)

Anthony Joseph (Security)

Randy Katz (Systems)

Michael Mahoney (ML)

Ben Recht (Machine Learning)

Raluca Popa (Systems/security) joining in Summer 2015

Page 6: AMP Camp 5 Intro

Industrial Engagement

• Industrial-Strength Open Source Software

• Used by Sponsors, Start-ups and many others

• Regular interactions with top industry technologists

twice-yearly 3-day offsite retreats; AMPCamp training, some

site visits

Page 7: AMP Camp 5 Intro

AMP: Integrating 3 Key

Resources

Algorithms

• Machine Learning, Statistical Methods

• Prediction, Business Intelligence

Machines

• Clusters and Clouds

• Warehouse Scale Computing

People

• Crowdsourcing, Human Computation

• Data Scientists, Analysts

Page 8: AMP Camp 5 Intro

Time

AnswerQualityMoney

Our View of the Big Data Challenge

8

Step 1:

Improve

efficiency(e.g. Spark,

Tachyon)

Massive Diverse

and Growing

Data

Massive Diverse

and Growing

Data

Step 1I:

Enable

intelligent

tradeoffs(e.g.,

BlinkDB

SampleCle

an)

Page 9: AMP Camp 5 Intro

+ + Integration +

Extreme Elasticity +

Tradeoffs +

More Sophisticated Analytics

= Extreme Complexity

The Research Challenge

Page 10: AMP Camp 5 Intro

Arc of our Research

ProgramEarly work on Foundations (Yrs 1-2):

Algorithms – Bag of Little Bootstraps

Machines – Mesos and Spark

People – CrowdDB Prototype

Filling out the Analytics Stack (Yrs 3-4): <you are here>

Algorithms – ML Pipelines, Async Algorithms, Concurrency Ctl

Machines – Tachyon, SQL, Graphs, Streams, R, Performance

People – Hybrid Human/Machine Data Cleaning/Integration

Moving Up the Stack/Expanding the Footprint (Yrs5-6):

Algorithms – MLlib build out, Declarative ML (MLBase)

Machines – New Storage/Processing Archs, Data/Model Serving

Page 11: AMP Camp 5 Intro

Big Data Ecosystem

Evolution

MapReduce

Pregel

Dremel

GraphLab

Storm

Giraph

DrillTez

Impala

S4…

Specialized systems(iterative, interactive and

streaming apps)

General batch

processing

Page 12: AMP Camp 5 Intro

AMPLab Unification

PhilosophyDon’t specialize MapReduce – Generalize it!

Two additions to Hadoop MR can enable all the

models shown earlier!

1. General Task DAGs

2. Data Sharing

For Users:

Fewer Systems to Use

Less Data MovementSpark

Str

eam

ing

Gra

phX

…S

park

SQ

L

MLbase

Page 13: AMP Camp 5 Intro

Velox Model Serving

Tachyon

SparkStreamin

gShark

BlinkDB

GraphX MLlib

MLBa

se

Spark

R

Cancer Genomics, Energy Debugging, Smart

BuildingsSample Clean

In House Applications

Spark

Berkeley Data Analytics Stack

(open source software)

HDFS,

S3, …Mesos Yarn

Access and Interfaces

Processing Engine

Resource VirtualizationResource

Virtualization

Storage

Processing

Engine

Access and

Interfaces

In-house

Apps

TachyonStorage

Page 14: AMP Camp 5 Intro

Velox Model Serving

Tachyon

SparkStreamin

g

BlinkDB

GraphX MLlib

MLBa

se

Spark

R

Cancer Genomics, Energy Debugging, Smart

BuildingsSample Clean

Spark

Berkeley Data Analytics Stack

(open source software)

HDFS,

S3, …Mesos YarnResource

Virtualization

Storage

Processing

Engine

Access and

Interfaces

In-house

Apps

Tachyon

Apache

Apache

SharkSparkSQ

L

Page 15: AMP Camp 5 Intro

Some Academic Accolades

Ph.D. + Postdoc alumni 2013/14 above have accepted faculty jobs at: Brown, Harvey Mudd, MIT(3), Stanford,

UCLA, UT Austin

Best Paper Awards: BPOE14,Eurosys13, ICDE 13, NSDI 12, SIGCOMM 12 and Best Demo: SIGMOD 12, VLDB 11CACM “Research Highlight” Selections 2014 and 2015

Page 16: AMP Camp 5 Intro

About AMPCampHistory

Today • BDAS and Stack Component Overviews

• Hands On Exercises

• Use Cases

• Reception and Networking

Tomorrow• Research and ML Overviews

• Advanced Hands On Exercises (including

genomics)

AMPCamp I @ Berkeley, August 2012

AMPCamp II @ Strata NYC., Feb 2013

AMPCamp III @ Berkeley, August 2013

AMPCamp IV @Strata Santa Clara, Feb 2014

AMPCamp V @Berkeley, Nov 2015

Also “Spark Camp”: AMPCamp Spinoff

Page 17: AMP Camp 5 Intro

AMPCamp Made Possible

ByRachit Agarwal

Elaine Angelino

Peter Bailis

Dan Crankshaw

Ankur Dave

Joseph Gonzalez

Daniel Haas

Sanjay Krishnan

Haoyuan Li

Frank Austin Nothaft

Xinghao Pan

Pedro Rodriguez

Ginger Smith

Evan Sparks

Shivaram Venkataraman

Jiannan Wang

Zongheng Yang

Ameet Talwalkar

Jey Kottalam

Kattt Atchley

Carlyn Chinen

Boban Zarkovich

Jon Kuroda

Page 18: AMP Camp 5 Intro

To find out more or

get involved:

amplab.berkeley.edu

[email protected]

du

UC BERKELEY

Thanks to NSF CISE Expeditions in Computing, DARPA XData,

Founding Sponsors: Amazon Web Services, Google, and SAP,

the Thomas and Stacy Siebel Foundation,

and all our industrial sponsors and partners.