Download - Two Techniques for Proving Lower Bounds
![Page 1: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/1.jpg)
Two Techniques for Proving Lower Bounds
Hagit AttiyaTechnion
![Page 2: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/2.jpg)
Goal of this Presentation•Describe two common techniques for
proving lower bounds in distributed computing:▫Information theory arguments▫Covering
•Variations•Applications
![Page 3: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/3.jpg)
nicer system architecture
My always first slide…
real system architecture
algorithm
problem
implementation
![Page 4: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/4.jpg)
Part IInformation Theory Arguments
![Page 5: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/5.jpg)
Overview•Bound the flow of information among
processes (and memory)•Show that information takes long to be
acquired•Argue that solving a particular problem
requires information about many processes•Usually applies to:
▫Shared memory systems▫Synchronous executions (imply lower bounds
also for asynchronous executions)•Details depend on the primitives used
![Page 6: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/6.jpg)
Single-writer registers: Possible argument•Need to read from each process•The state of a process can be found only
in its own register•Hence, first process must read n registers
![Page 7: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/7.jpg)
Not reallyWhen processes take steps together
First process doubles information in 2nd step
But can’t do better than that
![Page 8: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/8.jpg)
More Refined Argument• Consider synchronized executions
▫Processes take steps in rounds ▫All reads appear before all writes
• INF(pi,t-1): The set of inputs influencing process pi at the start of round t▫For t = 1, INF(pi,t-1) = {pi}▫For t > 1, if pi reads a value written by pj,
INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
![Page 9: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/9.jpg)
INF determines the state• INF(pi,t-1): The set of inputs influencing process pi at
the start of round t▫For t = 1, INF(pi,t-1) = {pi}▫For t > 1, if pi reads a value written by pj,
INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
Proof by case analysis
Lemma: If the states of processes in INF(pi,t-1) are the same in configurations C and C’, then pi takes the same steps in a t-round execution from C and from C’
![Page 10: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/10.jpg)
Size of INF• INF(pi,t-1): The set of inputs influencing process pi at
the start of round t▫For t = 1, INF(pi,t-1) = {pi}▫For t > 1, if pi reads a value written by pj,
INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
• I(t) = max |INF(pi,t)|
I(t) ≤ 2t
Lemma: I(0) = 1, and I (t) ≤ 2 I(t-1)
![Page 11: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/11.jpg)
Simple application: Computing OR
• Consider input configurationC0 = (0,0, , 0, , 0)
• The size of the influence set of a process is < n in all rounds < log n
• Some process pi is not in INF(p1,log n-1)
By lemma, p_1 returns the same value in C0 and in C1 = (0,0, , 1, , 0)
A contradiction
pi
![Page 12: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/12.jpg)
Application: Approximate agreementFor a small ² > 0•Processes start with input in [0,1]•Must decide on an output in [0,1] such that
▫All outputs are within ² of each other (agreement)
▫If all inputs are v, the output is v (validity)
System is asynchronous and a process must decide even if it runs by itself (solo termination)
![Page 13: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/13.jpg)
Application: Approximate agreement[Attiya, Shavit, Lynch]
•Consider input configuration C0 = (0,0, , , , 0)
•Run all processes to completion from C0
must decide 0
•If number of rounds T < log nÞ I(T) < nÞ 9 process pi INF(p1,T)
![Page 14: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/14.jpg)
Approximate agreement (cont.)•Consider two input configurations
C0 = (0, , , , , 0) C1 = (0, , 1 , , 0)
•Run pi to completion, must decide 1•pi INF(p1,T)Þp1 still decides 0 when running from this
configuration, contradicting agreement
pi
Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run
![Page 15: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/15.jpg)
Approximate agreement (cont.)•Consider two input configurations
C0 = (0, , , , , 0) C1 = (0, , 1 , , 0)
•Run pi to completion, must decide 1•pi INF(p1,T)Þp1 still decides 0 when running from this
configuration, contradicting agreement
pi
Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run
Overhead of solo-termination: in “nice” runs, since otherwise, a synchronous algorithm can solve the problem in one round.
![Page 16: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/16.jpg)
With multi-writer registers•Previous theorem does not hold•A wait-free approximate agreement
algorithm that takes O(1) rounds in “nice” executions
[Schenk]
•Even simpler: An O(1) OR algorithm
![Page 17: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/17.jpg)
With multi-writer registers•Previous theorem does not hold•A wait-free approximate agreement
algorithm that takes O(1) rounds in “nice” executions
[Schenk]
•Even simpler: An O(1) OR algorithm
•Only a few initial configurations to distinguish between
Can you
find it?
Overhead of single-writer registers: Separates single-writer and multi-writer registers
![Page 18: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/18.jpg)
Information flow with multi-writer registers
The previous argument does not hold
Instead, consider how learning more information allows to differentiate between input configurations
Capture as a partitioning of process states and memory values
[Beame]
(0, , 1 , , 0)
(0 , , ,
, ,0)
(1, , 1 , , 0)
(0, , 0 , , 1)
![Page 19: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/19.jpg)
Multi-writer registers: Ordering events
Within each round•Put all reads, then•Put all writes
ÞReads obtain value written at the end of previous round
![Page 20: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/20.jpg)
Partitioning into equivalence classesFor process p and round t, two input configurations are in the same equivalence class of P(p,t) if p is in the same state after t rounds from both(in a synchronous failure-free execution)
P(t): the number of classes after t rounds (max over p)
V(R,t), V(t) defined similarly for locations R
P(t), V(t) · (4n+2)2t−2
Lemma: P(t) · P(t-1)V(t-1) and V(t) · n P(t-1)+V(t-1)
![Page 21: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/21.jpg)
Application: The collect problem• update(v) stores v as latest value of a process• collect() returns a set of values (one per process)
When each process initially stores one of two valuesÞ There are 2n possible input configurations
Each leading to a different output
Previous lemma implies (4n+2)2t−2 ≥ P(t) ≥ 2n
Þ Must have (log n) rounds
![Page 22: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/22.jpg)
Also for other primitives (CAS)Non-reading CAS
Reading CAS returns the old value (can be handled, but we won’t do that)
Can also extend to non-reading kCAS
CAS(R,old,new){if R==old then
R = newreturn success
else return fail}
![Page 23: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/23.jpg)
Careful with CASMore information flow in a sequence of steps
initially, R == 0cas(R,0,1) cas(R,1,2) . . . cas(R,n−1,n)
On the other hand
cas(R,n-1,n) cas(R,n-2,n-1) . . . cas(R,0,1)
![Page 24: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/24.jpg)
Ordering events within a roundPut all reads first.Put all writes last.
For every register R whose current value is v, consider all CAS events:
▫Put all events with old v: all fail▫Put all events with old == v: only the first succeeds
(assumes operations are non-degenerate)
Allows to prove a lemma analogue to multi-writer registers (different constants)
![Page 25: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/25.jpg)
Information Flow with Bounded Fan-InArbitrary objects, but bounded contention
▫Not too many processes access the same base object similtaneously
Isolate processes n a Q-independent execution ▫Only processes in Q take steps▫Access only objects not modified by processes
in QFor a process p 2 Q, a Q-independent
execution is indistinguishable from a p-solo execution
![Page 26: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/26.jpg)
Constructing independent executions
Proof by induction, with a trivial base case.
Induction step: consider Qt-independent execution. We use the following result from graph theory.
Look at the next steps processes in Qt are about to perform, and construct an undirected graph (V,E)
Lemma: For any algorithm using only objects with contention ≤ w and every t ≥ 0, there is a t-round Qt-independent execution, with| Qt | ≥ n/(w+2)t
Turan theorem: Any graph (V,E) has an independent set of size |V|2/(|V|+2|E|)
![Page 27: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/27.jpg)
Induction step: The graph• V = Qt
• E contains an edge {pi, pj} if ▫pi and pj access the same object, or▫pi is about to read an object modified by pj, or ▫pj is about to read an object modified by pi
|E| ≤ | Qt|(w+1)/2
Turan’s theorem and inductive hypothesis there is an independent set Qt+1 of size ≥ n/(w+2)t
Omit all steps of Qt – Qt+1 from the execution to get a Qt+1-independent execution
![Page 28: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/28.jpg)
Application: Weak Test&SetWeak test&set: Like test&set but at most one success
Take t such that (w+2)t < nLemma gives a t-round {pi,pj}-independent execution
• Each of pi and pj seems to be running solo must succeed Contradiction
Theorem: The solo step complexity of weak test&set is (log n / log w )
![Page 29: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/29.jpg)
Part IICovering
![Page 30: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/30.jpg)
Covering: The basic idea
Several processes write to the same locationWrites by early processes are lost, if no read in between
Must write to distinct locationsOther process must read these locations
![Page 31: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/31.jpg)
Max Register•WriteMax(v,R) operation
•ReadMax operation op returns the maximal value written by a WriteMax operation that▫completed before op started, or▫overlaps op
•Special case of a linearizable object
![Page 32: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/32.jpg)
Lower bound for ReadMax operation
[Jayanti, Tan, Toueg]
The proof is constructive
Theorem: ReadMax must read n different registers.
![Page 33: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/33.jpg)
Construction for the lower bound
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperationreads
R1 … Rk
Proof by induction on k = 0, …, n
Base case is simple
Taking k = n yields the result
![Page 34: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/34.jpg)
Inductive Step
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperation
pk+1
perform WriteMaxoperations
must write to R R1 …
Rk
¯k
writesby p1 … pk
to R1 … Rk°
k
Pn performs ReadMaxoperation
does not observe
pk+1
![Page 35: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/35.jpg)
¼k
Inductive Step
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperation
pk+1
perform WriteMaxoperations
must write to R R1 …
Rk
¯k
writesby p1 … pk
to R1 … Rk°
k
Pn performs ReadMaxoperation
must readR R1 …Rk
![Page 36: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/36.jpg)
Inductive Step
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperation
pk+1
perform WriteMaxoperations
¯k
writesby p1 … pk
to R1 … Rk°
k
Pn performs ReadMaxoperationwrite to Rk+1
Claim follows with R1 … Rk Rk+1 and ®k+1 = ®k ¼k
¼k
![Page 37: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/37.jpg)
Swap objectsTheorem holds for other primitives and objects, e.g., (register-to memory) swap
Need some care in constructing ¼k, °k
swap(R,v){tmp = Rreturn tmp
}
![Page 38: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/38.jpg)
Result holds also for other objects•E.g., counters
•Constructed execution contains many increment operations
•Better algorithms when▫Few increment operations▫Max register holds bounded values
[Aspnes, Attiya, Censor-Hillel]
![Page 39: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/39.jpg)
Counters with CASCounters can be implemented with a single location R, and a single CAS per operation:•To increment, simply:
▫read previous value from R▫CAS +1 to R
•To read the counter, simply read R
Lots of contention on R! This is inevitable
![Page 40: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/40.jpg)
The memory stalls measure[Dwork, Herlihy, Waarts]
If k processes access (or modify) the same location at the same configuration
▫The first process incurs one step, and no stalls▫The second process incurs one step, and one stall▫ .▫ .▫ .▫The k’th process incurs one step, and k-1 stalls
![Page 41: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/41.jpg)
Lower bound on number of stallsTheorem: ReadCounter must incur n stalls + steps.
p1 … pk poised onR1 … Rm, m · k
p1 … pk
perform Incrementoperations
Pn performs ReadCounter
operationaccessesR1 … Rm
Similar construction as in previous theorem
![Page 42: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/42.jpg)
Lower bound on number of stallsTheorem: ReadCounter must incur n stalls + steps.
p1 … pk poised onR1 … Rm, m · k
p1 … pk
perform Incrementoperations
Pn performs ReadCounter
operationaccessesR1 … Rk
incurs k
stalls +
steps
Similar construction as in previous theorem
![Page 43: Two Techniques for Proving Lower Bounds](https://reader035.vdocuments.net/reader035/viewer/2022062410/56816260550346895dd2bd02/html5/thumbnails/43.jpg)
Wrap-up•There are many lower bound results
But fewer techniques…
•Some results & techniques are relevant to questions asked in Transform
•Material is based on monograph-in-writing with Faith Ellen▫Let me know if you want to proof-read it!