![Page 1: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/1.jpg)
Probability in Computing
© 2010, Quoc Le & Van Nguyen Probability for Computing 1
LECTURE 5: MORE APPLICATIONS WITH PROBABILISTIC ANALYSIS, BINS AND BALLS
![Page 2: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/2.jpg)
Agenda
Review: Coupon Collector’s problem and Packet SamplingAnalysis of Quick-Sort
© 2010, Quoc Le & Van Nguyen Probability for Computing 2
Analysis of Quick-SortBirthday Paradox and applicationsThe Bins and Balls Model
![Page 3: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/3.jpg)
Coupon Collector ProblemProblem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one of every type of coupon, you can send in for a prize.
Question: How many boxes of cereal must you buy before obtaining at least one of every type of coupon.
© 2010, Quoc Le & Van Nguyen Probability for Computing 3
before obtaining at least one of every type of coupon.
Let X be the number of boxes bought until at least one of every type of coupon is obtained.
E[X] = nH(n) = nlnn
![Page 4: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/4.jpg)
Application: Packet Sampling
Sampling packets on a router with probability p The number of packets transmitted after the last sampled
packet until and including the next sampled packet is geometrically distributed.
From the point of destination host, determining all
© 2010, Quoc Le & Van Nguyen Probability for Computing 4
From the point of destination host, determining all the routers on the path is like a coupon collector’s problem.
If there’s n routers, then the expected number of packets arrived before destination host knows all of the routers on the path = nln(n).
![Page 5: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/5.jpg)
DoS attack
© 2010, Quoc Le & Van Nguyen Probability for Computing 5
![Page 6: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/6.jpg)
IP traceback
Marking and Reconstruction Node append vs.
node sampling
© 2010, Quoc Le & Van Nguyen Probability for Computing 6
node sampling
![Page 7: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/7.jpg)
Node apend
A1 A2A3
R5 R6 R7
D
D R6
© 2010, Quoc Le & Van Nguyen Probability for Computing 7
VR1
R2
R3 R4D R6 R3
D R6 R3 R2
D R6 R3 R2 R1D R6 R3 R2 R1
![Page 8: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/8.jpg)
Node SamplingA1
A2 A
3
R3
R4
R5
R6
R7
D1 R7
© 2010, Quoc Le & Van Nguyen Probability for Computing 8
V
R1
R2
3 4
R2p=0.51
D1 R2
x=0.2 < p
![Page 9: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/9.jpg)
Expected Run-Time ofQuickSort
© 2010, Quoc Le & Van Nguyen Probability for Computing 9
![Page 10: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/10.jpg)
AnalysisWorst-case: n2.Depends on how we choose the pivot.Good pivot (divide the list in two nearly equal
length sub-lists) vs. Bad pivot.
© 2010, Quoc Le & Van Nguyen Probability for Computing 10
length sub-lists) vs. Bad pivot.In case of good pivot -> nlg(n). [by solving
recurrence]
If we choose pivot point randomly, we will have a randomized version of QuickSort.
![Page 11: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/11.jpg)
AnalysisXij be a random variable that Takes value 1 if yi and yj are compared with each other 0 if they are not compared.
E[X] = ∑∑E[Xij]E[X ] = 2/ (j-i+1)
© 2010, Quoc Le & Van Nguyen Probability for Computing 11
ijE[Xij] = 2/ (j-i+1) Consider when the set Yij={yi, yi+1, …, yj} is
“touched” by a pivot the first time. If this pivot is either yi or yj then the two will be compared, otherwise Never – a (S1,S2) split
Using k = j-i+1, we can compute E[X] = 2nln(n)
![Page 12: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/12.jpg)
Detail analysis
© 2010, Quoc Le & Van Nguyen Probability for Computing 12
![Page 13: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/13.jpg)
What is the probability thattwo persons in a room of30 have the same
Birthday “Paradox”
© 2010, Quoc Le & Van Nguyen Probability for Computing 13
30 have the samebirthday?
![Page 14: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/14.jpg)
Birthday Paradox
Ways to assign k different birthdays without duplicates:
N = 365 * 364 * ... * (365 – k + 1)
© 2010, Quoc Le & Van Nguyen Probability for Computing 14
N = 365 * 364 * ... * (365 – k + 1)= 365! / (365 – k)!Ways to assign k different birthdays with possible duplicates:
D = 365 * 365 * ... * 365 = 365k
![Page 15: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/15.jpg)
Birthday “Paradox”Assuming real birthdays assigned randomly: N/D = probability there are no duplicates1 - N/D = probability there is a duplicate
© 2010, Quoc Le & Van Nguyen Probability for Computing 15
= 1 – 365! / ((365 – k)!(365)k )
![Page 16: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/16.jpg)
16Generalizing Birthdays
P(n, k) = 1 – n!/(n-k)!nk
Given k random selections from n possible
© 2010, Quoc Le & Van Nguyen Probability for Computing 16
Given k random selections from n possible values, P(n, k) gives the probability that there is at least 1 duplicate.
![Page 17: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/17.jpg)
Birthday Probabilities
P(no two match) = 1 – P(all are different)P(2 chosen from N are different)
= 1 – 1/NP(3 are all different)
= (1 – 1/N)(1 – 2/N)
© 2010, Quoc Le & Van Nguyen Probability for Computing 17
= (1 – 1/N)(1 – 2/N)P(n trials are all different)
= (1 – 1/N)(1 – 2/N) ... (1 – (n – 1)/N)ln (P)
= ln (1 – 1/N) + ln (1 – 2/N) + ... ln (1 – (k – 1)/N)
![Page 18: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/18.jpg)
Happy Birthday Bob!
ln (P) = ln (1 – 1/N) + ... + ln (1 – (k – 1)/N)For 0 < x < 1: ln (1 – x) x
ln (P) – (1/N + 2/N + ... + (n – 1)/N)Gauss says:
© 2010, Quoc Le & Van Nguyen Probability for Computing 18
1 + 2 + 3 + 4 + ... + (n – 1) + n = ½ n (n + 1)So,
ln (P) ½ (k-1) k/NP e½ (k-1)k / N
Probability of match 1 – e½ (k-1)k / N
![Page 19: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/19.jpg)
Applying Birthdays
P(n, k) > 1 – e-k*(k-1)/2n
For n = 365, k = 20:P(365, 20) > 1 – e-20*(19)/2*365
P(365, 20) > .4058
© 2010, Quoc Le & Van Nguyen Probability for Computing 19
P(365, 20) > .4058For n = 264, k = 232: P (264, 232) > .39For n = 264, k = 233: P (264, 233) > .86For n = 264, k = 234: P (264, 234) > .9996Application: Digital Signatures
![Page 20: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/20.jpg)
Digital Signature Scheme: UsingHash Functions
A hash function H maps a message of variable length n bits to a fingerprint of fixed length m bits, with m < n.
This hash value is also called a digest (of
© 2010, Quoc Le & Van Nguyen
This hash value is also called a digest (of the original message).
Since n>m, there exist many X which are map to the same digest collision.
![Page 21: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/21.jpg)
DS schemes with hashfunctions
DAH
C X.DA(H(X))
X
Signature Generator
H(X) Concatenation
© 2010, Quoc Le & Van Nguyen
Signature Generator
EA
H
+X.DA(H(X)) 0 – Accept1 – Reject
Signature Verifier
![Page 22: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/22.jpg)
Main propertiesGiven a hash function H: X →Y
Long message short, fixed-length hashOne-way property: given y Yit is computationally infeasible to find a value xX
© 2010, Quoc Le & Van Nguyen
it is computationally infeasible to find a value xX s.t. H(x) = y
Collision resistance (collision-free)it is computationally infeasible to find any two
distinct values x’, x X s.t. H(x’) = H(x) This property prevent against signature forgery
![Page 23: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/23.jpg)
Collisions
Avoiding collisions is theoretically impossible Dirichlet principle: n+1 rabbits into n cages at
least 2 rabbits go to the same cage This suggest exhaustive search: try |Y|+1
messages then must find a collision (H:X Y)
© 2010, Quoc Le & Van Nguyen
This suggest exhaustive search: try |Y|+1 messages then must find a collision (H:XY)
In practice Choose |Y| large enough so exhaustive search is
computational infeasible. |Y| not too large or long signature and slow process
However, collision-freeness is still hard
![Page 24: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/24.jpg)
Birthday attackCan hash values be of 64 bits? Look good, initially, since a space of size
264 is too large to do exhaustive search or compute that many hash values
© 2010, Quoc Le & Van Nguyen
compute that many hash values However a birthday attack can easily break
a DS with a 64-bit hash function In fact, the attacker only need to create a
bunch of 232 messages and then launch the attack with reasonably high probability for success.
![Page 25: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/25.jpg)
How is the attack
Goal: given H, find x, x’ such that H(x)=H(x’)Algorithm: pick a random set S of q values in X for each xS, computes hx=H(x)
if h =h for some x’≠x then collision found: (x,x’), else
© 2010, Quoc Le & Van Nguyen
if hx=hx’ for some x’≠x then collision found: (x,x’), else fail
The average success probability is = 1-exp(q(q-1)/2|Y|) Suppose Y has size 2m, choose q ≈2m/2 then is
almost 0.5!
![Page 26: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/26.jpg)
Balls into BinsWe have m balls that are thrown into n bins, with the location of each ball chosen independently and uniformly at random from n possibilities. What does the distribution of the balls into the bins look like
© 2010, Quoc Le & Van Nguyen Probability for Computing 26
What does the distribution of the balls into the bins look like “Birthday paradox” question: is there a bin with at
least 2 balls How many of the bins are empty? How many balls are in the fullest bin?
Answers to these questions give solutions to many problems in the design and analysis of algorithms
![Page 27: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/27.jpg)
The maximum loadWhen n balls are thrown independently and uniformly at random into n bins, the probability that the maximum load is more than 3 lnn/lnlnn is at most 1/n for nsufficiently large. By Union bound, Pr [bin 1 receives M balls]
Note that:
© 2010, Quoc Le & Van Nguyen Probability for Computing 27
Note that:
Now, using Union bound again, Pr [ any ball receives M balls] is at most
which is 1/n
![Page 28: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/28.jpg)
Application: Bucket SortA sorting algorithm that
breaks the (nlogn) lower bound under certain input assumptionBucket sort works as follows:
© 2010, Quoc Le & Van Nguyen Probability for Computing 28
Bucket sort works as follows: Set up an array of initially
empty "buckets." Scatter: Go over the original
array, putting each object in its bucket.
Sort each non-empty bucket. Gather: Visit the buckets in
order and put all elements back into the original array.
A set of n =2m integers, randomly chosen from [0,2k),km, can be sorted in expected time O(n) Why: will analyze later!
![Page 29: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/29.jpg)
The Poisson DistributionConsider m balls, n bins Pr [ a given bin is empty] = Let Xj is a indicator r.v. that os 1 if bin j empty, 0 otherwise Let X be a r.v. that represents # empty bins
© 2010, Quoc Le & Van Nguyen Probability for Computing 29
Generalizing this argument, Pr [a given bin has r balls] =
Approximately,
So:
![Page 30: Probability in Computing - users.soict.hust.edu.vn€¦ · Coupon Collector Problem Problem: Suppose that each box of cereal contains one of n different coupons. Once you obtain one](https://reader036.vdocuments.net/reader036/viewer/2022062604/5fbfc0fbc822f24c47069364/html5/thumbnails/30.jpg)
Limit of the Binomial Distribution
© 2010, Quoc Le & Van Nguyen Probability for Computing 30