a distributed computing infrastructure for … distributed computing infrastructure for autonomic...

39
Navigating in the Storm A Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO, Reliable Network Solutions (www.rnets.com)

Upload: trinhdang

Post on 24-May-2018

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

Navigating in the StormA Distributed Computing Infrastructurefor Autonomic Computing

Ken Birman

Professor, Dept. of Computer ScienceCornell University

CEO, Reliable Network Solutions (www.rnets.com)

Page 2: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 2

An autonomic computing scenario

Page 3: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 3

An autonomic computing scenarioWhere can I find a display device suitable fordisplaying these NMR test results?Are any specialists in epilepsy available toconsult on this ER admission?Where can I find a dozen idle computers withcopies of the NMR-3D package and to which Ican download this 20MB dataset rapidly?Is there a free office where I could use acomputer and a phone?

Page 4: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 4

An autonomic computing scenario

Page 5: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 5

An autonomic computing scenario

Page 6: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 6

An autonomic computing scenario

Page 7: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 7

Hypothesis

Key challenge is scalable configurationHow to find resources?Need an information framework

A structured worldReflects physical, security, role constraintsTracks changes as things move and evolve

You talk to the framework through alocal portal – a nearby agent

Page 8: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 8

Hypothesis refinedWhat makes it hard to build such aframework?

Fundamental challenge is data replication

Must replicate data at many scalesSmall scale for high availabilityMedium scale for management of server resources(like SP cluster management but larger)Very large scale for our “resource finder” service!

Page 9: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 9

This leads to a dilemmaDecades of work on replication has yielded

Good solutions for small-scale replicationFor example, the CORBA fault-tolerance standard, toreplicate a software object

Reasonable solutions for medium-scale replication,e.g. in cluster management

Examples are SP, NT-Clusters, various web clusteringarchitectures

But nothing works at “Internet scale”!

Page 10: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 10

Poor ScalabilityLong “rumored” for distributed computingtechnologies and toolsFamous study by Jim Gray points toscalability issues in distributed databasesThings that scale well:

Tend to be stateless or based on soft stateHave weak reliability semanticsAre loosely coupled

But this mixture is at odds with autonomy…

Page 11: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 11

Why doesn’t anything scale?With weak semantics…

Faulty behavior may occur more often as systemsize increases (think “the Internet”)

With strong semantics…Encounter a system-wide cost (e.g. membershipreconfiguration, congestion control)That can be triggered more often as a function ofscale (more failures, or more network “events”, orbigger latencies)

Page 12: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 12

Astrolabe

Astrolabe is our informationmonitoring and replicationarchitectureIt has two components

Mariner: a form of databaseMulticast: for faster “few to many” datatransfer patterns

Page 13: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 13

Astrolabe fights fire with fire!

Randomized protocols for scalability…… with probabilistic reliability goalsThis overcomes the kind of scalabilityproblems just seen

Then think about hierarchyNearby information needs to be trackedmore accurately in real-timeRemote information can be summarized

Page 14: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 14

First focus on Mariner

Mariner’s role is to track informationresiding at a vast number of sourcesStructured to look like a databaseApproach: “peer to peer gossip”.Basically, each machine has a piece of ajigsaw puzzle. Assemble it on the fly.

Page 15: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 15

Mariner in a single domain

6.0

4.1

6.2

Word Version

014.5cardinal

011.5falcon

102.0swift

…SMTP?Weblogic?LoadName

Row can have many columnsTotal size should be k-bytes, not megabytesConfiguration certificate determines whatdata is pulled into the table (and can change)

3.1

5.3

0.9

1.9

3.6

0.8

2.1

2.7

1.1

1.8

Page 16: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 16

Build a hierarchy using a P2P protocol that“assembles the puzzle” without any servers

6.0

4.1

6.2

WordVersion

014.5cardinal

011.5falcon

102.0swift

…SMTP?Weblogic?LoadName

6.2

6.2

4.5

WordVersion

01.5gnu

103.2zebra

001.7gazelle

…SMTP?Weblogic?LoadName

14.66.71.1214.66.71.83.1Paris

127.16.77.11127.16.77.61.8NJ

123.45.61.17123.45.61.32.6SF

SMTP contactWL contactAvgLoad

Name

San Francisco New Jersey

SQL query“summarizes”

data

Dynamically changingquery output is visiblesystem-wide

Page 17: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 17

(1) Query goes out… (2) Compute locally… (3)results flow to top level of the hierarchy

6.0

4.1

6.2

WordVersion

014.5cardinal

011.5falcon

102.0swift

…SMTP?Weblogic?LoadName

6.2

6.2

4.5

WordVersion

01.5gnu

103.2zebra

001.7gazelle

…SMTP?Weblogic?LoadName

14.66.71.1214.66.71.83.1Paris

127.16.77.11127.16.77.61.8NJ

123.45.61.17123.45.61.32.6SF

SMTP contactWL contactAvgLoad

Name

San Francisco New Jersey

1

3 3

1

2 2

Page 18: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 18

Hierarchy is virtual… data is replicated

6.0

4.1

6.2

WordVersion

014.5cardinal

011.5falcon

102.0swift

…SMTP?Weblogic?LoadName

6.2

6.2

4.5

WordVersion

01.5gnu

103.2zebra

001.7gazelle

…SMTP?Weblogic?LoadName

14.66.71.1214.66.71.83.1Paris

127.16.77.11127.16.77.61.8NJ

123.45.61.17123.45.61.32.6SF

SMTP contactWL contactAvgLoad

Name

San Francisco New Jersey

Page 19: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 19

Hierarchy is virtual… data is replicated

6.0

4.1

6.2

WordVersion

014.5cardinal

011.5falcon

102.0swift

…SMTP?Weblogic?LoadName

6.2

6.2

4.5

WordVersion

01.5gnu

103.2zebra

001.7gazelle

…SMTP?Weblogic?LoadName

14.66.71.1214.66.71.83.1Paris

127.16.77.11127.16.77.61.8NJ

123.45.61.17123.45.61.32.6SF

SMTP contactWL contactAvgLoad

Name

San Francisco New Jersey

Page 20: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 20

Mariner solves our problem!A flexible, user-programmable mechanism

Where can I find a display device suitable for displayingthese NMR test results?Are any specialists in epilepsy available to consult on this ERadmission?Find 12 idle computers with copies of the NMR-3D packageto which I can download a 20MB dataset rapidlyIs there a free office where I could use a computer and aphone?

Think of aggregation functions as small agents thatlook for information

Page 21: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 21

Aggregation and Hierarchy

Nearby informationMaintained in more detail, can query itdirectlyChanges seen sooner

Remote information summarizedHigh quality aggregated dataThis also changes as information evolves

Page 22: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 22

So how does it work?Each computer has its own row and replicasof some objects (queries, other rows, etc)Periodically, but at a fixed rate, pick a friend“pseudo-randomly” and exchange statesefficiently (bound the size of data exchanged)

States converge exponentially rapidly.Loads are low and constant and protocol is robustagainst all sorts of disruptions!

Elect representatives to protocol for higherlevels of the hierarchy

Page 23: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 23

Marioner summaryScalable: could support millions of machinesFlexible: can easily extend domain hierarchy,define new columns or eliminate old ones.Adapts as conditions evolve.Secure:

Uses keys for authentication and can even encryptHandles firewalls gracefully, including issues of IPaddress re-use behind firewalls

Performs well: updates propagate in secondsCheap to run: tiny load, small memory impact

Page 24: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 24

Contrast with most P2P schemes

Our peer-to-peer approach is implementedusing pseudo-random gossipIn contrast most peer-to-peer architectures

Are specifically intended to support file systemsDon’t use pseudo-random P2P patternsAny hierarchical structure is “real”; ours is anabstraction constructed by the protocol itself

Page 25: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 25

Unlimited scalability!Probabilistic gossip “routes around”congestionAnd probabilistic reliability model lets thesystem move on if a computer lags behindResults in:

Constant communication costsConstant loads on linksSteady behavior even under stress

Page 26: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 26

Bimodal MulticastA second part of Astrolabe

Mariner is for “all to all” information

Bimodal multicast is a peer-to-peer scalableprotocol for “few to many” patterns

We pick the destinations using a Marineraggregation function!Then can stream data to these recipients

Protocol scales with constant costs and workseven under extreme stress!

Page 27: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

Server cluster

User’s computer getsinformation locally

When resource information changes,Mariner identifies servers to inform

log

Bimodal Multicast

6.0

4.1

6.2

Word Version

014.5cardinal

011.5falcon

102.0swift

…SMTP?Weblogic?LoadName

6.2

6.2

4.5

Word Version

01.5gnu

103.2zebra

001.7gazelle

…SMTP?Weblogic?LoadName

14.66.71.1214.66.71.83.1Paris

127.16.77.11127.16.77.61.8NJ

123.45.61.17123.45.61.32.6SF

SMTP contactWL contactAvgLoad

Name

San Francisco New Jersey

SQL querySQL query

Virtual “summary” table

Mariner manages configuration andconnection parameters, tracks system

membership and state.

Combined Astrolabe technologies could be the basisof an autonomic configuration architecture

Page 28: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 28

Adds up to a new opportunity

Autonomic computing… failed in the past because ourapplications operated in the dark… but now may be possible because forthe first time, we can make systemstructure explicit and track it as it changes

With appropriate support, opens door toa new approach to computing

Page 29: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,
Page 30: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 30

More information?

http://www.cs.cornell.edu/ken/astrolabe.pdfhttp://www.cs.cornell.edu/ken/bimodal.pdf

http://www.rnets.com

Page 31: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 31

EXTRA SLIDES

These are for answering questions

Page 32: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 32

Reminder: Multicast scaling issue

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

50

100

150

200

250Virtually synchronous Ensemble multicast protocols

perturb rate

aver

age

thro

ughp

ut o

n no

nper

turb

ed m

embe

rs group size: 32group size: 64group size: 96

32

96

Page 33: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

Start by using unreliable multicast to rapidlydistribute the message. But some messagesmay not get through, and some processes maybe faulty. So initial state involves partialdistribution of multicast(s)

Uses IP multicast (unreliable)to distribute data

Page 34: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

Periodic gossip used to repair message loss.(In reality, it isn’t synchronized, and we do allsorts of things to avoid excessive gossip overWAN links or slow connections)

Rounds of gossip repair gaps

Page 35: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 35

Bimodal Multicast use of gossipDates back to NNTP, Clearinghouse. Bestpapers are by Demers et. al.Periodically, each process picks some otherprocess, merge statesMathematics: epidemic infectionMust tune to deal with bandwidth, latencyissues, other pragmatics

For example, if a region is missing some data, were-multicast it locally

Page 36: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

Pbcast bimodal delivery distribution

1.E- 3 0

1.E- 2 5

1.E- 2 0

1.E- 15

1.E- 10

1.E- 0 5

1.E+0 0

0 5 10 15 2 0 2 5 3 0 3 5 4 0 4 5 50

number of processes to deliver pbcast

p{#p

roce

sses

=k}

Scalability of Pbcast reliability

1.E- 3 5

1.E- 3 0

1.E- 2 51.E- 2 0

1.E- 15

1.E- 10

1.E- 0 5

10 15 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0

#processes in system

P{fa

ilure

}

Predicate I Predicate II

Ef fects of fanout on reliability

1.E- 161.E- 141.E- 121.E- 101.E- 0 81.E- 0 61.E- 0 41.E- 0 21.E+0 0

1 2 3 4 5 6 7 8 9 1 0

fanout

P{fa

ilure

}

Predicate I Predicate II

Fanout required for a specif ied reliability

44 .5

55 .5

66 .5

77 .5

88 .5

9

2 0 25 3 0 3 5 4 0 4 5 5 0

#processes in systemfa

nout

Predicate I for 1E-8 reliability

Predicate II for 1E-12 reliability

Figure 5: Graphs of analytical results

Bimodal Multicast is amenable to formal analysis

Page 37: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

High Bandwidth measurements with varying numbers of sleepers

050

100150200

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

Probability of sleep event

Thro

ughp

ut

mea

sure

d at

un

pertu

rbed

pr

oces

s

Traditional w/1 s leeperPbcas t w/1 s leeperTraditional w/3 s leepersPbcas t w 3/s leepersTraditional w/5 s leepersPbcas t w/5 s leepers

Page 38: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 38

Good things?These technologies overcome Internetlimitations using randomized P2P gossip

However, Internet routing can “defeat” our cleversolutions unless we know network topology

Both have great scalability and can surviveunder stressAnd both are backed by formal models aswell as real code and experimental data

Indeed, analysis is “robust” too!

Page 39: A Distributed Computing Infrastructure for … Distributed Computing Infrastructure for Autonomic Computing Ken Birman Professor, Dept. of Computer Science Cornell University CEO,

April 12, 2002 IBM Almaden Institute (Autonomic Computing) 39

Publications

Main paper on Mariner:http://www.cs.cornell.edu/ken/mariner.pdf

Main paper on Multicast:http://www.cs.cornell.edu/ken/bimodal.pdf

Full set of papers:http://www.cs.cornell.edu/Info/Projects/Spinglass/pubs