improving the efficiency of fault-tolerant distributed shared-memory algorithms eli sadovnik and...
TRANSCRIPT
![Page 1: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/1.jpg)
Improving the Efficiency of Fault-Tolerant Distributed
Shared-Memory Algorithms
Eli Sadovnik and Steven Homberg
Second Annual MIT PRIMES Conference, May 19-20, 2012
![Page 2: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/2.jpg)
Introduction
• Shared memory supports concurrent access– Read & write interface• Memory models: single writer, multiple reader (SWMR)
and multiple writer, multiple reader (MWMR)– Consistency is important• Strong consistency provides useful semantics
• Abstraction for message-passing networks– Shared memory can be emulated– Difficult to do, but solutions exist– For example applications for the Internet, such as Dropbox
![Page 3: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/3.jpg)
Our Research Project
THE RAMBO PROJECT•Framework for emulating shared memory– Introduced by Lynch and Shvartsman, extended by Gilbert– Implements the MWMR model with strong consistency– Designed for dynamic distributed message-passing settings
OUR GOAL•RAMBO is elegant but not always efficient•Extend RAMBO with intelligent data management
![Page 4: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/4.jpg)
Consistency & Atomicity• There are many consistency models• We are interested in atomicity
Violation
(Safety)
Violation
(Safety)
Violation
(Regularity)
Violation
(Regularity)
AtomicityAtomicity
time
0
read(3) read(0) read(8)
write(8)
time
0
read(8) read(0) read(8)
write(8)
time
0
read(8) read(8) read(8)
write(8)
![Page 5: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/5.jpg)
Emulating Shared Memory
Data:
5
Status:
WORKING
User 1:ReaderData:
5
User 2:WriterData:
5
User 3:ReaderData:
5
![Page 6: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/6.jpg)
Weakness of the Centralized Approach
Data:
Status:
FAILED
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
error
errorerror
![Page 7: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/7.jpg)
Replication in Distributed Setting
Data:
Status:
FAILED
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5
55
Data:
5
Status:
WORKING
Data:
5
Status:
WORKING
![Page 8: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/8.jpg)
The ABD AlgorithmHagit Attiya, Amotz Bar-Noy, Danny Dolev
A SWMR algorithm•Operation level wait-freedom– Termination unaffected by concurrency
•Designed for a message-passing setting– Allows limited failures– Communication is reliable– Messages can be delayed
![Page 9: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/9.jpg)
Quorum Systems and ABD
• ABD is a quorum based algorithm– Quorum system is a collection of intersecting sets
• For example a voting majority quorum system
• Data is replicated in a quorum systems– Quorum system members are networked servers
• Guarantee of atomicity– Quorum intersection and read/write protocols
• Reads must write! (… sometimes as we will see later)– A reader must write the latest data– Writer cannot be trusted to complete
![Page 10: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/10.jpg)
Phased Read/Write Protocols
Data:
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5
55
Data:
Status:
WORKING
Data:
Status:
WORKING
Q2
Q1
User 2 writesits data, a 5, to quorum Q1.
55
![Page 11: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/11.jpg)
Phased Read/Write Protocols
Data:
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5
55
Data:
5
Status:
WORKING
Data:
5
Status:
WORKING
Q2
Q1
User 1 queriesquorum Q2,sees the latestdata is a 5,and writesthat back tothe computerthat does nothave the latestdata.
5
![Page 12: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/12.jpg)
Data Versions & Timestamps
Data:
5,t=1
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
5,t=1
7,t=25,t=1
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
Timestamps allow us to distinguish among different versions of the data.
![Page 13: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/13.jpg)
Data Versions & Timestamps
Data:
7,t=2
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
7,t=2
7,t=27,t=2
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
![Page 14: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/14.jpg)
Quorum Viability
Data:
7,t=2
Status:
WORKING
User 1:ReaderData:
User 2:WriterData:
User 3:ReaderData:
error
errorerror
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
Data:
Status:
FAILED
Data:
Status:
FAILED
A weakness ofthe ABD algorithmis that it isdependent ona quorum ofservers always beingviable. When no quorum is available, thenoperations are blocked.
![Page 15: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/15.jpg)
The RAMBO Framework(Reconfigurable Atomic Memory
for Basic Objects)
Seth Gilbert
Nancy Lynch
Alexander Shvartsman
![Page 16: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/16.jpg)
Quorum Reconfiguration
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
Q2
Q1
Data:
Status:WORKING
Data:
Status:WORKING
RAMBO uses quorum reconfiguration to ensure service longevity.
A new quorum system (a new set of servers) is installed to replace the old ones, allowing progress in spite of failures.
![Page 17: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/17.jpg)
Replica Transfer
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
Q2
Q1
Data:
Status:WORKING
Data:
Status:WORKING
7,t=2
7,t=2 7,t=2
After a new set of servers is installed, these servers do not have any information.
The replica information (copies of data) must be transferred to the new configuration.
![Page 18: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/18.jpg)
Garbage Collection
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
After information is transferred to the new servers, the old servers are phased out of use.
This process is called `garbage collection’.
The mechanism for garbage collection has two phases and is analogous to read/write operations (introduced in the next slies).
![Page 19: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/19.jpg)
Read/Write Operations
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
User 1:ReaderData:
7,t=2 7,t=2 7,t=2 7,t=2
7,t=2
What if reads and writes occur during reconfiguration?
Concurrent operations contact all existing configurations to ensure the latest information is accessed.
Multi-Configuration Access
![Page 20: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/20.jpg)
Read/Write Operations
Old configurations need to be removed from use.
Ongoing read/write operations use their existing configuration knowledge. New operations ignore the old configuration.
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
User 1:ReaderData:
7,t=2
7,t=2 7,t=27,t=27,t=2
Garbage Collection
![Page 21: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/21.jpg)
Q1: Can a reader (respectively writer) avoid contacting configurations that it learned have been marked as garbage collected?
Q2: When can a reader avoid its second phase, and can a reader propagate selectively?
Q3: Can we propagate to the most recent configuration only?
Research Questions
![Page 22: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/22.jpg)
Concurrent Garbage Collection (Q1)
Data:
5,t=1
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
Q2
Q1
Data:
Status:WORKING
Data:
0,t=0
Status:WORKING
7,t=2
1
2
3
6
7Return 7
7,t=2 7,t=2 7,t=2 7,t=2
4
User 1:ReaderData:
5
7,t=2
7,t=2
We believe that the garbage collected configuration can in fact be ignored because the reader learns of the configuration’s information regardless.
7,t=2 0,t=0 0,t=00,t=0 0,t=0
![Page 23: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/23.jpg)
Improved Configuration Management (Q1)
• Authors of RAMBO conjecture that operations must contact all configurations that are discovered during the query (respectively propagate) phase.
• Communicating with configurations learned to be garbage collected mid-operation is unnecessary– Intermediate discovery of garbage collected configurations
from another server– That server knows at least as recent tag as any known in
the old configurations
• IMPACT: improves operation liveness
![Page 24: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/24.jpg)
Improved Bookkeeping (Q2)
Data:
7,t=2
Status:
WORKING
User 1:ReaderData:
Data:
7,t=2
Status:
WORKING
Data:
7,t=2
Status:
WORKING
Q2
Q1
7t=2
7t=2
After querying the reader learns that a majority of nodes has the up-to-date information, thus making propagation needless.
7,t=2
7,t=27,t=2
7,t=27,t=2
![Page 25: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/25.jpg)
Semi-Fast Read Operations (Q2)
• Read operations always propagate– Regardless of the actual replica dissemination – Redundant messages and slow operation
• The proposed solution– During the query phase, reader records the latest
timestamps of server with which it communicated– The reader contacts servers that are not up-to-date– Sometimes this allows omitting the propagation phase
entirely (`semi-fast’ read operations)• IMPACT: improves operation latency and reduces
communication costs
![Page 26: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/26.jpg)
Overly Extensive Propagation (Q3)
Data:
Status:FAILED
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Data:
7,t=2
Status:WORKING
Q2
Q1
Data:
7,t=2
Status:WORKING
Data:
Status:WORKING
User 1:WriterData:
7,t=27,t=27,t=27,t=27,t=2
Currently, RAMBO both queries and propagates to all active configurations. In fact, just the query phase covering all active configurations is sufficient for atomicity.
![Page 27: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/27.jpg)
Propagate to the Latest Configuration (Q3)
• We believe it is not necessary to propagate to any configuration but the last active configuration.
• Properties of configuration information • All configurations are totally ordered.• Configuration have a forward link.• Discovery is faster than reconfiguration
• Operations query all active configurations• IMPACT: reduces communication cost
![Page 28: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/28.jpg)
Summary
• Algorithmic optimizations• Opportunistic benefits– A clear advantage when • Servers gossip, and• Configurations have members in common
• Changes are minimally intrusive– Modest increase in bookkeeping and the size of
messages
![Page 29: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/29.jpg)
Future Work
• Formal reasoning– Use the Input/Output Automata framework to
demonstrate that the new changes preserve consistency guarantees of RAMBO
• Simulation– Use the TEMPO toolkit to simulate RAMBO executions and
build confidence in our proofs
• Empirical experiments– Augment the existing implementations of RAMBO and
collect behavior data on Planet-Lab
![Page 30: Improving the Efficiency of Fault-Tolerant Distributed Shared-Memory Algorithms Eli Sadovnik and Steven Homberg Second Annual MIT PRIMES Conference, May](https://reader035.vdocuments.net/reader035/viewer/2022062321/56649f0c5503460f94c1fc5b/html5/thumbnails/30.jpg)
Special Thanks to:The MIT PRIMES Program
Supervisor Prof. Nancy Lynch
Mentor Dr. Peter Musial