ken birman professor, dept. of computer science. today’s cloud computing platforms are best for...

17
A High-Assurance Cloud Computing Agenda Ken Birman Professor, Dept. of Computer Science

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Context

Today’s cloud computing platforms are best for building “apps” like YouTube, web search Highly elastic, pipelined (“asynchronous”)

services But very weak guarantees and limited security

The cloud comes with its own mantra: Don’t use ACID! BASE is better… CAP theorem proves it (or does it?)

Cornell Dept of Computer Science Colloquium 3

The Wisdom of the Sages

Sept 24, 2009

Cornell Dept of Computer Science Colloquium 4

eBay’s Five Commandments

As described by Randy Shoup at LADIS 2008

Thou shalt…1. Partition Everything2. Use Asynchrony Everywhere3. Automate Everything4. Remember: Everything Fails5. Embrace Inconsistency

Sept 24, 2009

Cornell Dept of Computer Science Colloquium 5

Vogels at the Helm

Werner Vogels is CTO at Amazon.com… His first act?

Introduced a series of weak consistency options

Replaced the older strongly consistent “pub/sub” infrastructure with slower but more scalable one

In small systems, raw speed wins In the cloud

Weaker forms of guarantees oftenscale far better than strong ones

Sept 24, 2009

Our Perspective?

We’re being too quick to give up on consistency and other assurance properties CAP, BASE are really about database

consistency Other very strong forms of consistency can be

the foundation for a new science of highly assured, high speed, scalable cloud computing

We have the science to back our vision The new Isis2 system makes it real

Highly Assured Cloud: Isis2

Named for an old Cornell story In 1990 our first Isis Toolkit became the

core of the NYSE, French Air Traffic Control System and US Navy AEGIS

Isis2 : A completely new system but same idea Makes it easy to create high-assurance cloud apps Offers consistency, fault-tolerance, security

FreeBSD code release later this spring

Cornell Dept of Computer Science Colloquium 11

Virtual synchrony meets Paxos (and they live happily ever after…)

Virtual synchrony is a “consistency” model: Synchronous runs: indistinguishable from non-

replicated object that saw the same updates (like Paxos)

Virtually synchronous runs are indistinguishable from synchronous runs

p

q

r

s

t

Time: 0 10 20 30 40 50 60 70

p

q

r

s

t

Time: 0 10 20 30 40 50 60 70

Synchronous execution Virtually synchronous execution

Sept 24, 2009

Non-replicated reference execution

A=3

B=7

B = B-A

A=A+1

Example: Parallel search

Replies = g.query(LOOKUP, “Name=*Smith”);

g.callback(myReplyHndlr, Replies, typeof(double));

public void myReplyHndlr(double[] fnd) { foreach(double d in fnd) avg += d; …}

public void myLookup(string who) { divide work into viewSize() chunks this replica will search chunk # getMyRank();

…..

reply(myAnswer);}

Group g = new Group(“/amazon/something”);g.register(LOOKUP, myLookup);

Scalable Aggregation

Used if group is really big Request, updates: still via multicast Response is aggregated within a tree

Level 0

Level 1

Level 2Agg(va vb vc vd )

query

a

a

ca

c

db

va

vb

vc

vd

Agg(vc vd)Agg(va vb)

reply

Example: nodes {a,b,c,d}

collaborate to perform a query

Aggregated Parallel search

Replies = g.query(LOOKUP, 27, “Name=*Smith”);

g.callback(myReplyHndlr, Replies, typeof(double));

public void myReplyHndlr(double[] fnd) { The answer is in fnd[0]….}

public void myLookup(int rid, string who) { divide work into viewSize() chunks this replica will search chunk # getMyRank();

…..

SetAggregateValue(myAnswer);}

Group g = new Group(“/amazon/something”);g.register(LOOKUP, myLookup);

Rval = GetAggregateResult(27);Reply(Rval/DatabaseSize);

Our Early Users?

Partnering with Cisco to apply these ideas in core Internet routers (NEBULA/R3 projects) Creating a continuously available CRS-1 story

Close dialogs with Microsoft, IBM, Intel

Funding from National Science Foundation, Air Force, talking to DARPA and ARPAe Government, military and smart power grid will

all need highly assured cloud options

Challenge of the week

Debugging a system that targets thousands of nodes with tens of cores each is hard! We benefit from our own strong model But physical access to non-virtualized large-scale

systems is “difficult” today And many block IPMC and UDP

Better tools will need to be part of a better assurance property Else we know how it should work but not how it

does work, or even whether it works correctly!

Summary?

The word on the street is that cloud computing will rule but that the cloud can’t do high assurance

At Cornell we just don’t believe that

Not long from now we’ll put a solution in your hands showing how it can be done