a distributed computing infrastructure for … distributed computing infrastructure for autonomic...
TRANSCRIPT
Navigating in the StormA Distributed Computing Infrastructurefor Autonomic Computing
Ken Birman
Professor, Dept. of Computer ScienceCornell University
CEO, Reliable Network Solutions (www.rnets.com)
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 2
An autonomic computing scenario
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 3
An autonomic computing scenarioWhere can I find a display device suitable fordisplaying these NMR test results?Are any specialists in epilepsy available toconsult on this ER admission?Where can I find a dozen idle computers withcopies of the NMR-3D package and to which Ican download this 20MB dataset rapidly?Is there a free office where I could use acomputer and a phone?
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 4
An autonomic computing scenario
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 5
An autonomic computing scenario
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 6
An autonomic computing scenario
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 7
Hypothesis
Key challenge is scalable configurationHow to find resources?Need an information framework
A structured worldReflects physical, security, role constraintsTracks changes as things move and evolve
You talk to the framework through alocal portal – a nearby agent
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 8
Hypothesis refinedWhat makes it hard to build such aframework?
Fundamental challenge is data replication
Must replicate data at many scalesSmall scale for high availabilityMedium scale for management of server resources(like SP cluster management but larger)Very large scale for our “resource finder” service!
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 9
This leads to a dilemmaDecades of work on replication has yielded
Good solutions for small-scale replicationFor example, the CORBA fault-tolerance standard, toreplicate a software object
Reasonable solutions for medium-scale replication,e.g. in cluster management
Examples are SP, NT-Clusters, various web clusteringarchitectures
But nothing works at “Internet scale”!
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 10
Poor ScalabilityLong “rumored” for distributed computingtechnologies and toolsFamous study by Jim Gray points toscalability issues in distributed databasesThings that scale well:
Tend to be stateless or based on soft stateHave weak reliability semanticsAre loosely coupled
But this mixture is at odds with autonomy…
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 11
Why doesn’t anything scale?With weak semantics…
Faulty behavior may occur more often as systemsize increases (think “the Internet”)
With strong semantics…Encounter a system-wide cost (e.g. membershipreconfiguration, congestion control)That can be triggered more often as a function ofscale (more failures, or more network “events”, orbigger latencies)
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 12
Astrolabe
Astrolabe is our informationmonitoring and replicationarchitectureIt has two components
Mariner: a form of databaseMulticast: for faster “few to many” datatransfer patterns
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 13
Astrolabe fights fire with fire!
Randomized protocols for scalability…… with probabilistic reliability goalsThis overcomes the kind of scalabilityproblems just seen
Then think about hierarchyNearby information needs to be trackedmore accurately in real-timeRemote information can be summarized
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 14
First focus on Mariner
Mariner’s role is to track informationresiding at a vast number of sourcesStructured to look like a databaseApproach: “peer to peer gossip”.Basically, each machine has a piece of ajigsaw puzzle. Assemble it on the fly.
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 15
Mariner in a single domain
6.0
4.1
6.2
Word Version
014.5cardinal
011.5falcon
102.0swift
…SMTP?Weblogic?LoadName
Row can have many columnsTotal size should be k-bytes, not megabytesConfiguration certificate determines whatdata is pulled into the table (and can change)
3.1
5.3
0.9
1.9
3.6
0.8
2.1
2.7
1.1
1.8
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 16
Build a hierarchy using a P2P protocol that“assembles the puzzle” without any servers
6.0
4.1
6.2
WordVersion
014.5cardinal
011.5falcon
102.0swift
…SMTP?Weblogic?LoadName
6.2
6.2
4.5
WordVersion
01.5gnu
103.2zebra
001.7gazelle
…SMTP?Weblogic?LoadName
14.66.71.1214.66.71.83.1Paris
127.16.77.11127.16.77.61.8NJ
123.45.61.17123.45.61.32.6SF
SMTP contactWL contactAvgLoad
Name
San Francisco New Jersey
SQL query“summarizes”
data
Dynamically changingquery output is visiblesystem-wide
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 17
(1) Query goes out… (2) Compute locally… (3)results flow to top level of the hierarchy
6.0
4.1
6.2
WordVersion
014.5cardinal
011.5falcon
102.0swift
…SMTP?Weblogic?LoadName
6.2
6.2
4.5
WordVersion
01.5gnu
103.2zebra
001.7gazelle
…SMTP?Weblogic?LoadName
14.66.71.1214.66.71.83.1Paris
127.16.77.11127.16.77.61.8NJ
123.45.61.17123.45.61.32.6SF
SMTP contactWL contactAvgLoad
Name
San Francisco New Jersey
1
3 3
1
2 2
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 18
Hierarchy is virtual… data is replicated
6.0
4.1
6.2
WordVersion
014.5cardinal
011.5falcon
102.0swift
…SMTP?Weblogic?LoadName
6.2
6.2
4.5
WordVersion
01.5gnu
103.2zebra
001.7gazelle
…SMTP?Weblogic?LoadName
14.66.71.1214.66.71.83.1Paris
127.16.77.11127.16.77.61.8NJ
123.45.61.17123.45.61.32.6SF
SMTP contactWL contactAvgLoad
Name
San Francisco New Jersey
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 19
Hierarchy is virtual… data is replicated
6.0
4.1
6.2
WordVersion
014.5cardinal
011.5falcon
102.0swift
…SMTP?Weblogic?LoadName
6.2
6.2
4.5
WordVersion
01.5gnu
103.2zebra
001.7gazelle
…SMTP?Weblogic?LoadName
14.66.71.1214.66.71.83.1Paris
127.16.77.11127.16.77.61.8NJ
123.45.61.17123.45.61.32.6SF
SMTP contactWL contactAvgLoad
Name
San Francisco New Jersey
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 20
Mariner solves our problem!A flexible, user-programmable mechanism
Where can I find a display device suitable for displayingthese NMR test results?Are any specialists in epilepsy available to consult on this ERadmission?Find 12 idle computers with copies of the NMR-3D packageto which I can download a 20MB dataset rapidlyIs there a free office where I could use a computer and aphone?
Think of aggregation functions as small agents thatlook for information
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 21
Aggregation and Hierarchy
Nearby informationMaintained in more detail, can query itdirectlyChanges seen sooner
Remote information summarizedHigh quality aggregated dataThis also changes as information evolves
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 22
So how does it work?Each computer has its own row and replicasof some objects (queries, other rows, etc)Periodically, but at a fixed rate, pick a friend“pseudo-randomly” and exchange statesefficiently (bound the size of data exchanged)
States converge exponentially rapidly.Loads are low and constant and protocol is robustagainst all sorts of disruptions!
Elect representatives to protocol for higherlevels of the hierarchy
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 23
Marioner summaryScalable: could support millions of machinesFlexible: can easily extend domain hierarchy,define new columns or eliminate old ones.Adapts as conditions evolve.Secure:
Uses keys for authentication and can even encryptHandles firewalls gracefully, including issues of IPaddress re-use behind firewalls
Performs well: updates propagate in secondsCheap to run: tiny load, small memory impact
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 24
Contrast with most P2P schemes
Our peer-to-peer approach is implementedusing pseudo-random gossipIn contrast most peer-to-peer architectures
Are specifically intended to support file systemsDon’t use pseudo-random P2P patternsAny hierarchical structure is “real”; ours is anabstraction constructed by the protocol itself
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 25
Unlimited scalability!Probabilistic gossip “routes around”congestionAnd probabilistic reliability model lets thesystem move on if a computer lags behindResults in:
Constant communication costsConstant loads on linksSteady behavior even under stress
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 26
Bimodal MulticastA second part of Astrolabe
Mariner is for “all to all” information
Bimodal multicast is a peer-to-peer scalableprotocol for “few to many” patterns
We pick the destinations using a Marineraggregation function!Then can stream data to these recipients
Protocol scales with constant costs and workseven under extreme stress!
Server cluster
User’s computer getsinformation locally
When resource information changes,Mariner identifies servers to inform
log
Bimodal Multicast
6.0
4.1
6.2
Word Version
014.5cardinal
011.5falcon
102.0swift
…SMTP?Weblogic?LoadName
6.2
6.2
4.5
Word Version
01.5gnu
103.2zebra
001.7gazelle
…SMTP?Weblogic?LoadName
14.66.71.1214.66.71.83.1Paris
127.16.77.11127.16.77.61.8NJ
123.45.61.17123.45.61.32.6SF
SMTP contactWL contactAvgLoad
Name
San Francisco New Jersey
SQL querySQL query
Virtual “summary” table
Mariner manages configuration andconnection parameters, tracks system
membership and state.
Combined Astrolabe technologies could be the basisof an autonomic configuration architecture
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 28
Adds up to a new opportunity
Autonomic computing… failed in the past because ourapplications operated in the dark… but now may be possible because forthe first time, we can make systemstructure explicit and track it as it changes
With appropriate support, opens door toa new approach to computing
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 30
More information?
http://www.cs.cornell.edu/ken/astrolabe.pdfhttp://www.cs.cornell.edu/ken/bimodal.pdf
http://www.rnets.com
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 31
EXTRA SLIDES
These are for answering questions
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 32
Reminder: Multicast scaling issue
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
50
100
150
200
250Virtually synchronous Ensemble multicast protocols
perturb rate
aver
age
thro
ughp
ut o
n no
nper
turb
ed m
embe
rs group size: 32group size: 64group size: 96
32
96
Start by using unreliable multicast to rapidlydistribute the message. But some messagesmay not get through, and some processes maybe faulty. So initial state involves partialdistribution of multicast(s)
Uses IP multicast (unreliable)to distribute data
Periodic gossip used to repair message loss.(In reality, it isn’t synchronized, and we do allsorts of things to avoid excessive gossip overWAN links or slow connections)
Rounds of gossip repair gaps
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 35
Bimodal Multicast use of gossipDates back to NNTP, Clearinghouse. Bestpapers are by Demers et. al.Periodically, each process picks some otherprocess, merge statesMathematics: epidemic infectionMust tune to deal with bandwidth, latencyissues, other pragmatics
For example, if a region is missing some data, were-multicast it locally
Pbcast bimodal delivery distribution
1.E- 3 0
1.E- 2 5
1.E- 2 0
1.E- 15
1.E- 10
1.E- 0 5
1.E+0 0
0 5 10 15 2 0 2 5 3 0 3 5 4 0 4 5 50
number of processes to deliver pbcast
p{#p
roce
sses
=k}
Scalability of Pbcast reliability
1.E- 3 5
1.E- 3 0
1.E- 2 51.E- 2 0
1.E- 15
1.E- 10
1.E- 0 5
10 15 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0
#processes in system
P{fa
ilure
}
Predicate I Predicate II
Ef fects of fanout on reliability
1.E- 161.E- 141.E- 121.E- 101.E- 0 81.E- 0 61.E- 0 41.E- 0 21.E+0 0
1 2 3 4 5 6 7 8 9 1 0
fanout
P{fa
ilure
}
Predicate I Predicate II
Fanout required for a specif ied reliability
44 .5
55 .5
66 .5
77 .5
88 .5
9
2 0 25 3 0 3 5 4 0 4 5 5 0
#processes in systemfa
nout
Predicate I for 1E-8 reliability
Predicate II for 1E-12 reliability
Figure 5: Graphs of analytical results
Bimodal Multicast is amenable to formal analysis
High Bandwidth measurements with varying numbers of sleepers
050
100150200
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
Probability of sleep event
Thro
ughp
ut
mea
sure
d at
un
pertu
rbed
pr
oces
s
Traditional w/1 s leeperPbcas t w/1 s leeperTraditional w/3 s leepersPbcas t w 3/s leepersTraditional w/5 s leepersPbcas t w/5 s leepers
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 38
Good things?These technologies overcome Internetlimitations using randomized P2P gossip
However, Internet routing can “defeat” our cleversolutions unless we know network topology
Both have great scalability and can surviveunder stressAnd both are backed by formal models aswell as real code and experimental data
Indeed, analysis is “robust” too!
April 12, 2002 IBM Almaden Institute (Autonomic Computing) 39
Publications
Main paper on Mariner:http://www.cs.cornell.edu/ken/mariner.pdf
Main paper on Multicast:http://www.cs.cornell.edu/ken/bimodal.pdf
Full set of papers:http://www.cs.cornell.edu/Info/Projects/Spinglass/pubs