divide-and-conquer - upc universitat politècnica de...

Divide-and-conquer

Curs 2015

The divide-and-conquer strategy.

1. Break the problem into smallersubproblems,

2. recursively solve each problem,

3. appropriately combine theiranswers. Julius Caesar (I-BC)

”Divide et impera”

Known Examples:

I Binary search

I Mergesort

I Quicksort

I Strassen matrix multiplication

J. von Neumann(1903-57)Merge sort

Collaborative Filtering

In many commercial webs collaborativefiltering is a technique to match yourpreferences with other customers in theWeb, to guess which product you should berecommended.One manner to do it, make rankings ofgiven product (music, movies, novels) andmatch your interest with similar rankingsby other people.One way to quantify the notion of howsimilar are two ordered lists is by countingthe inversions between the two lists.

Counting inversions

Given n items (1, 2, . . . , n) consider A has a list of preferencesLA = {1, 2, , . . . , n}, and B has a list LB = {b1, b2, . . . , bn}.We want to see how similar (close) LB is to LATwo items i , j form an inversion if i < j in LA but i > j in LB .For example: Consider LA = {1, 2, 3, 4, 5, 6, 7, 8} andLB = {1, 5, 3, 2, 7, 4, 8, 6}

8Number of inversions

Number of inversions 28

All pairs are inversions

1 2 3 4 7 865

1 2 3 4 5 6 7 8

12348 7 6 5

1 5 3 2 7 4 68

(5,3),(5,2), (5,4), (5,7), (3,2)

(7,4), (7,6), (8,6)

D&C for counting inversions

Brute force algorithm: Look at every pair (i , j) tosee if it is an inversion:

(n2

)= n2/2 = O(n2)

Algorisme D&C

1. Divide: Separate the lists into two half

2. Conquer: Recursively count inversions in each half

3. Combine: Count inversions where i i j are in different halves

4. Return: The sum of the 3 quantities

The strategy for 3 will be similar to Mergesort

Combine two halfs E and D

key idea: sort using merge sort:

Given subproblems E and D at same level of recursion, with E andD already sorted:

I scan from left to right E and D

I compare i ∈ E with j ∈ D

I if i < j then i is not invertedwith left element in D

I if i > j then i is inverted withevery left element in D

I append the smaller to sortedlist.

1

1 3 7 8 2 4 5 7

63 8 2 4 5 7

1 2

+1 (3,2)

+4

1 2 3 4 5 6 7 8

Complexity: T (n) = 2T (n/2) + O(n) = O(n lg n)

Example

(5,4)

1 5 3 2 7 4 8 6

1

1

1

1

1

1

5

5

5

3 2

23

33 2

7

7

7

4 8 6

6

6

84

4 8

5 2 3 4 7 6 8

2 3 5 4 6 7 8

2 3 4 5 6 7 8

Total inversions= 7

(3,2), (7,4), (8,6)

(5,2),(5,3), (7,6)

2D-Closest pair of points

Give n points in the plane, find a pair ofpoints with the smallest Euclideandistance.

Assumption: No two points have the samex-coordinate.

Brute force algorithm: Compute the distancebetween every pair (i , j) and compare with theothers: O(n2)

Very easy: sort by coordinate. O(n lg n).

But sorting method does not generalize to higher dimensions (2)why?

D&C for 2D-Closest pair of points

D&C

1. Divide: Separate the plane by a line L,into two half E and D with samenumber of points (±1)

2. Conquer: Recursively find the minimaldistance between pairs of points ineach half.

3. Combine: Taking into considerationpairs of points (p, q) with p ∈ E andq ∈ D

4. Return: The pair of points at minimaldistance.

D

L1 L3L3 L3 L3L2L2

Recursive calls

E

D& C Algorithm

At each step:

Divide: Sort the n points by its xcoordinate.Take dn/2e into left of L (E ) andbn/2c into right of L (D) (O(n lg n))

Conquer: Return d = min{dE , dD}(2T (n/2))

D

L

d

dp

p

q q1 1 2

2

E D

E

Combine: There might be two points, one in E and other in D ,that are closer than d

D& C Algorithm: Combine phase

Take a vertical band of width 2daround LAny p ∈ E and q ∈ D s.t.d(p, q) ≤ d must reside in this band.There could be many other pointsinside the band.Focus only of points in the bandTo find the closest p, q in this band:Sort by increasing y -coordinate thepoints in the band,Y = {y1, y2, . . . ym}. Cost: O(n lg n)

2 d

L

dd

p

p

q q1 1 2

2

E D

E

D

D& C Algorithm: Combine phase

Consider a grid with d/2 inside the band• There is at most 1 point inside eachd2 ×

d2 cell. (the diagonal of the cell

= d√2< d)

• Two points > 2 cells rows apart havedistance > d (The distance between twopoints in two consecutive cells is

d√

54 = 1.118d .)

• Two points > 2 cells columns apart havedistance > d (The same argument, thatabove)

L

2 d

d/2

How many squares a point can influence?:

For every point in the sortedY = {y1, y2, . . . ym}, starting from y1we only have to explore the distancebetween yi and the nest 10 orderedpoints in Y .

yi , yj , d(yi , yj) ≤ d if |i − j | ≤ 10

So for every point in he band we onlyhave to compare with the 10 nearestpoints in Y , with a total cost 10n.

d

Closest-Pair Algorithm:

Closest-Pair (p1, . . . , pn)Sort by the x-coordinate to compute Ld1 =Closet-Pair(E )d2 =Closet-Pair(D)d = min{d1, d2}Delete points > d from LSort the remaining points by y -coordinate to form listed YScan in order Y list computing the distance with next 11elementsIf any of those distances is < d update d

T (n) = 2T (n/2) + O(n lg n) = O(n lg2 n)

Do you know how to improve to O(n lg n)

Random-Quicksort

Consider the function Ran-Partition:

Ran-Partition (A[p, . . . , q])r = rand(p, q) u.a.r.interchange A[p] and A[r ]

Using Ran-Partition, consider the following randomized Divide andConquer algorithm, on input A[1, . . . , n]:

Ran-Quicksort (A[p, . . . , q])r = Ran-Partition (A[p, . . . , q])if p < q then

Ran-Quicksort (A[1, . . . , r − 1])Ran-Quicksort (A[+1, . . . , q])

elsereturn A[p]

end if

Example

Ran−Partition of input

A={1,3,5,6,8,10,12,14,15,16,17,18,20,22,23}

8

3

6

16

12 18

1715 22

1

5 10

232014

Expected Complexity of Ran-Partition

• The expected running time T (n) of Rand-Quicksort is dominatedby the number of comparisons.• Every call to Rand-Partition has costΘ(1) + Θ(number of comparisons︸︷︷︸

p−q

)

• If we can count the number of comparisons, we can bound thethe total time of Quicksort.• Let X be the number of comparisons made in all calls ofRan-Quicksort• X is a rv as it depends of the random choices of Ran-Partition

Expected Complexity of Ran-Partition• Note: In the first application of Ran-Partition A[r ] compareswith all n − 1 elements.• Key observation: Any two keys are compared iff one of them is apivot, and they are compared at most one time.

never compare

10 12 14 16 17 18 20 22 2315

For simplicity assume all keys are different, for any input A[i , . . . , j ]to Ran-Quicksort, 1 ≤ i < j ≤ n, let Zi ,j be the ordered set of key{zi , zi+1, . . . , zj} (with zi the smallest).• Note |Zi ,j | = j − 1 + 1• Therefore choosing u.a.r. a pivot is done with probability

1

|Zi ,j |=

1

j − 1 + 1

.

Define the indicator r.v.:

Xij =

{1 if zi is compared to zj ,

0 otherwise.

Then, X =∑n−1

i=1

∑nj=i+1 Xi ,j

(this is true because we never compare a pair more than once)

E [X ] = E

n−1∑i=1

n∑j=i+1

Xi ,j

=n−1∑i=1

n∑j=i+1

E [Xi ,j ]

As E [Xi ,j ] = 0Pr [Xi ,j = 0] + 1Pr [Xi ,j = 1]

∴ E [Xi ,j ] = Pr [Xi ,j = 1] = Pr [zi is compared to zj ]

End of the proof and main theorem

E [X ] =∑n−1

i=1

∑nj=i+1 E [Xi ,j ]Pr [zi is compared to zj ]

As zi and zj compare iff one of them is chosen as pivot, then

Pr [Xi ,j ] = 1 = Pr [zi is pivot] + Pr [zj is pivot]

Because pivots as chosen u.a.r. in Zi ,j :

Pr [zi is pivot] = Pr [zj is pivot] = 1j−1+1

Therefore:

E [X ] =n−1∑i=1

n∑j=i+1

2

j − i + 1.

E [X ] =n−1∑i=1

n∑j=i+1

2

j − i + 1

= 2 ·n∑

i=1

(1

2+

1

3+ · · ·+ 1

n − i + 1)

< 2 ·n∑

i=1

(1

2+

1

3+ · · ·+ 1

n)

= 2 ·n∑

i=1

Hn = 2 · n · Hn = O(n lg n).

Therefore, E [X ] = 2n ln n + Θ(n).

TheoremThe expected complexity of Ran-Quicksort is E [Tn] = O(n lg n).

Selection and order statistics

Problem: Given a list A of n of unordered distinct keys, and ai ∈ Z, 1 ≤ i ≤ n, select the element x ∈ A that is larger thanexactly i − 1 other elements in A.

Notice if:

1. i = 1 ⇒ MINIMUM element

2. i = n ⇒ MAXIMUM element

3. i = bn+12 c ⇒ the MEDIAN

4. i = b0.9 · nc ⇒ order statistics

Sort A (O(n lg n) and search for A[i ] (Θ(n)).

Can we do it in linear time?Yes, Selection is easier than Sorting

Quick-Select

Given unordered A[1, . . . , n] return the i-th. element

I Quick-Select (A[p, . . . , q], i)

I r = Ran-Partition (p, q) to findposition of pivot

I if i = r return A[r ]

I if i < r Quick-Select(A[p, . . . , r − 1], i)

I else Quick-Select(A[r + 1, . . . , q], i) 3

A

1 8

Search for i=2 in A

m u h e c b k v

3=Ran−Partition(1,8)

he c b u v k m

1

Quick-Select Algorithm

Quickselect (A[p, . . . , q], i)if p = q then

return A[p]else

r =Ran-Partition (A[p, . . . , q])k = r − p + 1if i = k then

return A[q]if i < k then

return Quickselect (A[p, . . . , q − 1], i)else

return Quickselect (A[q + 1, . . . , r ], i − k)end if

end ifend if

Analysis.

I Lucky: at each recusrsive call the search space is reduced in9/10 of the size. Then T (n) ≤ T (9n/10) + Θ(n) = Θ(n).

I Unlucky: T (n) = T (n − 1) + Θ(n) = Θ(n2). In this case it isworst than sorting!.

TheoremGiven A[1, . . . , n] and i , the expected number of steps forQuick-Select to find the i-th. element in A is O(n)

ProofGiven A[1, . . . , n] let T (n) be a rv counting the expected numberof steps for Quick-Select to find the ith element.Quick-Select (A, i) returns the k-th. element with probability 1

|A| .Define the indicator rv:

Xij =

{1 if subarray |A| = k ,

0 otherwise.

Therefore, E [Xk ] = 1n

To get an UB on E [T (n)] assume thedesired i-th element always fells in thelargest side of the partition.When Xk = 1 we have subarrays of sizek − 1 and n − k.We get the recurrence:

m−k

k=Ran−Partition(A)

k

k−1

T (n) ≤n∑

k=1

XkT (max{k − 1, n − k}) + O(n)

Proof (cont.)

E [T (n)] ≤ E

[n∑

k=1

XkT (max{k − 1, n − k})

]+ O(n)

=n∑

k=1

E [XkT (max{k − 1, n − k})] + O(n)

=n∑

k=1

E [Xk ]E [T (max{k − 1, n − k})] + O(n)?

=1

n

n∑k=1

E [Xk ]E [T (max{k − 1, n − k})]

Notice max{k − 1, n − k} =

{k − 1 if k > n/2,

n − k otherwise.

E [T (n)] = 1n

∑n−1k=1 E [T (k)] + O(n) = O(n)

Deterministic linear selection.

Generate deterministically a good split element x .Divide the n elements in bn/2c groups, each with 5 elements (+possible one group with < 5 elements).


Sort each set to find its median, say xi . (Each sorting needs 6comparisons, i.e. Θ(1)) Total: d6n/2e


• Use recursively Select to find the median x of the medians{xi}, 1 ≤ i ≤ d6n/2e.• Use deterministic Partition (quick sort) to re-arrange the groupscorresponding to medians {xi} around x , in linear time on thenumber of medians.

x


Al least 32bn/5c = b3n/10c of the elements are ≤ x .

x


Al least 32bn/5c = b3n/10c of the elements are ≥ x .

x

The deterministic algorithm

Select (A, i)1.- Divide the n elements into bn/5c groups of 52.- Find the median by insertion sort, and take

the middle element3.- Use Select recursively to find the median x of the bn/5c

medians4.- Use Partition to place x and its group. Let k=rank of x5.- if i = k thenreturn x

else if i < k thenuse Select to find the i-th smallest in the left

elseuse Select recursively to find the i − k-th smallest in the right

end if

Notice steps 4 and 5 are the same as Quickselect.

Example

Get the mean (bn/2c) on the following input:

call SELECT on this instance

3 13 9 4 5 1 15 12 10 2 6 14 8 11 17

3

5

4

9

13

1

2

15

12

10

17

14

11

8

6

PARTITION around 10:

13 12 15 11 14 17103 4 5 9 1 2 6 8

To get the 7th element (mean)

Worst case Analysis.

I As at least ≥ 3n10 of the elements are ≥ x .

I At least 3n10 elements are < x .

I In the worst case, step 5 calls Select recursively ≤ 7n/10

I Steps 1, 2 and 4 take O(n) time. Step 3 takes time T (n/5)and step 5 takes time ≤ T (7n/10).

so we have

T (n) =

{Θ(1) if n ≤ 50

T (n/5) + T (7n/10) + Θ(n) if n > 50

Therefore, T (n) = Θ(n)

Notice: If we make groups of 7, the number of elements ≥ x is 2n7 ,

which yield T (n) ≤ T (n/7) + T (5n/7) + O(n) with solutionT (n) = O(n).However, if we make groups of 3, thenT (n) ≤ T (n/3) + T (2n/3) + O(n), which has a solutionT (n) = O(n ln n).

Conclusions

I From a randomized algorithm we remove the randomizationto get a fast deterministic algorithm for selection.

I From the practical point of view, the deterministic algorithmis slow. Use Quickselect.

divide-and-conquer - upc universitat politècnica de...

Documents