randomized algorithms pasi fränti 1.10.2014. treasure island treasure worth 20.000 awaits 5000 daa...
TRANSCRIPT
To buy or not to buy
Buy the map:
Take a change:
20000 – 5000 – 3000 = 12.000
20000 – 5000 = 15.000
20000 – 5000 – 5000 = 10.000
To buy or not to buy
Buy the map:
Take a change:
20000 – 5000 – 3000 = 12.000
20000 – 5000 = 15.000
20000 – 5000 – 5000 = 10.000
Expected result:0.5 ∙ 15000 + 0.5 ∙ 10000 = 12.500
Three type of randomization
1. Las Vegas- Output is always correct result- Result is not always found- Probability of success p
2. Monte Carlo- Result is always found- Result can be inaccurate (or even false!)- Probability of success p
3. Sherwood- Balancing the worst case behavior
Las Vegas
Input: Binary vector A[1, n]Output:Index of any 1-bit from A
LV(A, n)
REPEATk ← RAND(1, n);
UNTIL A[k]=1;
RETURN k
Revise
8-Queens puzzle
INPUT: Eight chess queens and an 8×8 chessboardOUTPUT: Setup where no queens attack each other
8-Queens brute force
Brute force• Try all positions• Mark illegal squares• Backtrack if dead-end• 114 setups in total
Random• Select positions randomly• If dead-end, start over
Randomized• Select k rows randomly• Rest rows by Brute Force
8
5
4
…
Where next…?
Pseudo code8-Queens(k)
FOR i=1 TO k DO // k Queens randomly r Random[1,8];IF Board[i,r]=TAKEN THEN RETURN Fail;ELSE ConquerSquare(i,r);
FOR i=k+1 TO 8 DO // Rest by Brute Forcer1; foundNO;WHILE (r≤8) AND (NOT found) DO
IF Board[i,r] NOT TAKEN THEN ConquerSquare(i,r); foundYES;
IF NOT found THEN RETURN Fail;
ConquerSquare(i,j)Board[i,j] QUEEN;FOR z=i+1 TO 8 DO
Board[z,j] TAKEN;Board[z,j-(z-i)] TAKEN;Board[z,j+(z-i)] TAKEN;
Probability of success
s = processing time in case of successe = processing time in case of failure
p = probability of successq = 1-p = probability of failure
ep
qst
qepspttt
qepsqtt
qtqepsteqpst
Example:
s=e=1, p=1/6
t=1+5/1∙1=6
Experiments with varying k
K S E T P
0 114 - 114 100%
1 39.6 - 39.6 100%
2 22.5 36.7 25.2 88%
3 13.5 15.1 29.0 49%
4 10.3 8.8 35.1 26%
5 9.3 7.3 46.9 16%
6 9.1 7 53.5 14%
7 9 7 56.0 13%
8 9 7 56.0 13%
Fastestexpectedtime
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Two centroids , butonly one cluster .
One centroid , buttwo clusters .
Swap-based clustering
Clustering by Random Swap
RandomSwap(X) → C, PC ← SelectRandomRepresentatives(X);P ← OptimalPartition(X, C);REPEAT T times
(Cnew, j) ← RandomSwap(X, C);Pnew ← LocalRepartition(X, Cnew, P, j);Cnew, Pnew ← Kmeans(X, Cnew, Pnew);IF f(Cnew, Pnew) < f(C, P) THEN
(C, P) ← Cnew, Pnew;
RETURN (C, P);
P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-
369, 2000.
Select random neighbor
Accept only if it improves
1. Random swap:
2. Re-partition vectors from old cluster:
3. Create new cluster:
c x j random M i random Nj i ( , ), ( , )1 1
p d x c i p jik M
i k i
arg min ,1
2
p d x c i Nik j k p
i ki
arg min , ,2
1
Clustering by Random Swap
Choices for swapSwap is made from
centroid rich area tocentroid poor area.
Swap is made fromcentroid rich area tocentroid poor area.
O(M) clusters
to be removed
O(M) clusters
where to add
O(M2) different choices in total
=
Select a proper centroid for removal:
– M clusters in total: premoval=1/M.
Select a proper new location:
– N choices: padd=1/N
– M of them significantly different: padd=1/M
In total:
– M2 significantly different swaps.
– Probability of each is pswap=1/M2
– Open question: how many of these are good
– Theorem: α are good for add and removal.
Probability for successful Swap
Probability of not finding good swap:T
Mq
2
2
1
2
2
1loglogM
Tq
2
2
1log
log
M
qT
Estimated number of iterations:
Clustering by Random Swap
Iterated T times
2
2
ln -α
MqT
2
2
2222-ln
/
ln -
/1ln
ln
α
Mq
Mα
q
Mα
qT
Upper limit:
Lower limit similarly; resulting in:
Bounds for the iterations
Number of iterations needed (T):
α
NMq-N
α
Mq-MNT
2
2
2 lnln ,
2
2
ln -α
MqT
t = O(αN)
Total time:
Time complexity of single step (t):
Total time complexity
Monte Carlo
Input: A bit vector A[1, n], iterations IOutput: An index of any 1 bit from A
LV(A, n, I) i ← 0; DO k ← RAND(1, n); i ← i + 1; WHILE (A[k]≠1 AND i ≤ I) RETURN k
Revise
Monte Carlo
Potential problems to be considered:• Detecting prime numbers• Calculating integral of a function
To appear in 2014… maybe…
Selection of pivot element
Something about Quicksort and Selection:• Practical example of re-sorting• Median selection
Add material for 2014
N-11
N-21
N-31
…O(N2)
Simulated dynamic linked list
1. Sorted array- Search efficient: O(logN)- Insert and Delete slow: O(N)
2. Dynamically linked list- Insert and Delete fast: O(1)- Search inefficient: O(N)
Simulated dynamic linked listExample
i 1 2 3 4 5 6 7
Value 2 4 15
1 5 21
7
Next 2 5 6 1 7 0 3
1 152 4 75 21Head
Linked list:
Head=4Simulated by
array:
SEARCH (A, x)
i := A.HEAD;max := A[i].VALUE;
FOR k:=1 TO N DOj:=RANDOM(1, N);y:=A[j].VALUE;IF (max<y) AND (y≤x)
THENi:=j; max:=y;
RETURN LinearSearch(A, x, i);
Simulated dynamic linked listDivide-and-conquer with randomization
N random breakpoints
Biggest breakpoint ≤ x
Value searched
Full search from breakpoint i
Analysis of the search
max search for
N N(on
average)
• Divide into N segments• Each segment has N/N = N elements• Linear search within one segment.• Expected time complexity = N + N =
O(N)
Experiment with students
1 2 3 4 99 100
Data (N=100) consists of numbers from 1..100:
Select N breaking points: