divide-and-conquer - upc universitat politècnica de...

37
Divide-and-conquer Curs 2015

Upload: others

Post on 09-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Divide-and-conquer

Curs 2015

Page 2: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

The divide-and-conquer strategy.

1. Break the problem into smallersubproblems,

2. recursively solve each problem,

3. appropriately combine theiranswers. Julius Caesar (I-BC)

”Divide et impera”

Known Examples:

I Binary search

I Mergesort

I Quicksort

I Strassen matrix multiplication

J. von Neumann(1903-57)Merge sort

Page 3: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Collaborative Filtering

In many commercial webs collaborativefiltering is a technique to match yourpreferences with other customers in theWeb, to guess which product you should berecommended.One manner to do it, make rankings ofgiven product (music, movies, novels) andmatch your interest with similar rankingsby other people.One way to quantify the notion of howsimilar are two ordered lists is by countingthe inversions between the two lists.

Page 4: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Counting inversions

Given n items (1, 2, . . . , n) consider A has a list of preferencesLA = {1, 2, , . . . , n}, and B has a list LB = {b1, b2, . . . , bn}.We want to see how similar (close) LB is to LATwo items i , j form an inversion if i < j in LA but i > j in LB .For example: Consider LA = {1, 2, 3, 4, 5, 6, 7, 8} andLB = {1, 5, 3, 2, 7, 4, 8, 6}

8Number of inversions

Number of inversions 28

All pairs are inversions

1 2 3 4 7 865

1 2 3 4 5 6 7 8

12348 7 6 5

1 5 3 2 7 4 68

(5,3),(5,2), (5,4), (5,7), (3,2)

(7,4), (7,6), (8,6)

Page 5: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

D&C for counting inversions

Brute force algorithm: Look at every pair (i , j) tosee if it is an inversion:

(n2

)= n2/2 = O(n2)

Algorisme D&C

1. Divide: Separate the lists into two half

2. Conquer: Recursively count inversions in each half

3. Combine: Count inversions where i i j are in different halves

4. Return: The sum of the 3 quantities

The strategy for 3 will be similar to Mergesort

Page 6: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Combine two halfs E and D

key idea: sort using merge sort:

Given subproblems E and D at same level of recursion, with E andD already sorted:

I scan from left to right E and D

I compare i ∈ E with j ∈ D

I if i < j then i is not invertedwith left element in D

I if i > j then i is inverted withevery left element in D

I append the smaller to sortedlist.

1

1 3 7 8 2 4 5 7

63 8 2 4 5 7

1 2

+1 (3,2)

+4

1 2 3 4 5 6 7 8

Complexity: T (n) = 2T (n/2) + O(n) = O(n lg n)

Page 7: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Example

(5,4)

1 5 3 2 7 4 8 6

1

1

1

1

1

1

5

5

5

3 2

23

33 2

7

7

7

4 8 6

6

6

84

4 8

5 2 3 4 7 6 8

2 3 5 4 6 7 8

2 3 4 5 6 7 8

Total inversions= 7

(3,2), (7,4), (8,6)

(5,2),(5,3), (7,6)

Page 8: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

2D-Closest pair of points

Give n points in the plane, find a pair ofpoints with the smallest Euclideandistance.

Assumption: No two points have the samex-coordinate.

Brute force algorithm: Compute the distancebetween every pair (i , j) and compare with theothers: O(n2)

Very easy: sort by coordinate. O(n lg n).

But sorting method does not generalize to higher dimensions (2)why?

Page 9: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

D&C for 2D-Closest pair of points

D&C

1. Divide: Separate the plane by a line L,into two half E and D with samenumber of points (±1)

2. Conquer: Recursively find the minimaldistance between pairs of points ineach half.

3. Combine: Taking into considerationpairs of points (p, q) with p ∈ E andq ∈ D

4. Return: The pair of points at minimaldistance.

D

L1 L3L3 L3 L3L2L2

Recursive calls

E

Page 10: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

D& C Algorithm

At each step:

Divide: Sort the n points by its xcoordinate.Take dn/2e into left of L (E ) andbn/2c into right of L (D) (O(n lg n))

Conquer: Return d = min{dE , dD}(2T (n/2))

D

L

d

dp

p

q q1 1 2

2

E D

E

Combine: There might be two points, one in E and other in D ,that are closer than d

Page 11: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

D& C Algorithm: Combine phase

Take a vertical band of width 2daround LAny p ∈ E and q ∈ D s.t.d(p, q) ≤ d must reside in this band.There could be many other pointsinside the band.Focus only of points in the bandTo find the closest p, q in this band:Sort by increasing y -coordinate thepoints in the band,Y = {y1, y2, . . . ym}. Cost: O(n lg n)

2 d

L

dd

p

p

q q1 1 2

2

E D

E

D

Page 12: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

D& C Algorithm: Combine phase

Consider a grid with d/2 inside the band• There is at most 1 point inside eachd2 ×

d2 cell. (the diagonal of the cell

= d√2< d)

• Two points > 2 cells rows apart havedistance > d (The distance between twopoints in two consecutive cells is

d√

54 = 1.118d .)

• Two points > 2 cells columns apart havedistance > d (The same argument, thatabove)

L

2 d

d/2

Page 13: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

How many squares a point can influence?:

For every point in the sortedY = {y1, y2, . . . ym}, starting from y1we only have to explore the distancebetween yi and the nest 10 orderedpoints in Y .

yi , yj , d(yi , yj) ≤ d if |i − j | ≤ 10

So for every point in he band we onlyhave to compare with the 10 nearestpoints in Y , with a total cost 10n.

d

Page 14: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Closest-Pair Algorithm:

Closest-Pair (p1, . . . , pn)Sort by the x-coordinate to compute Ld1 =Closet-Pair(E )d2 =Closet-Pair(D)d = min{d1, d2}Delete points > d from LSort the remaining points by y -coordinate to form listed YScan in order Y list computing the distance with next 11elementsIf any of those distances is < d update d

T (n) = 2T (n/2) + O(n lg n) = O(n lg2 n)

Do you know how to improve to O(n lg n)

Page 15: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Random-Quicksort

Consider the function Ran-Partition:

Ran-Partition (A[p, . . . , q])r = rand(p, q) u.a.r.interchange A[p] and A[r ]

Using Ran-Partition, consider the following randomized Divide andConquer algorithm, on input A[1, . . . , n]:

Ran-Quicksort (A[p, . . . , q])r = Ran-Partition (A[p, . . . , q])if p < q then

Ran-Quicksort (A[1, . . . , r − 1])Ran-Quicksort (A[+1, . . . , q])

elsereturn A[p]

end if

Page 16: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Example

Ran−Partition of input

A={1,3,5,6,8,10,12,14,15,16,17,18,20,22,23}

8

3

6

16

12 18

1715 22

1

5 10

232014

Page 17: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Expected Complexity of Ran-Partition

• The expected running time T (n) of Rand-Quicksort is dominatedby the number of comparisons.• Every call to Rand-Partition has costΘ(1) + Θ(number of comparisons︸ ︷︷ ︸

p−q

)

• If we can count the number of comparisons, we can bound thethe total time of Quicksort.• Let X be the number of comparisons made in all calls ofRan-Quicksort• X is a rv as it depends of the random choices of Ran-Partition

Page 18: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Expected Complexity of Ran-Partition• Note: In the first application of Ran-Partition A[r ] compareswith all n − 1 elements.• Key observation: Any two keys are compared iff one of them is apivot, and they are compared at most one time.

never compare

10 12 14 16 17 18 20 22 2315

For simplicity assume all keys are different, for any input A[i , . . . , j ]to Ran-Quicksort, 1 ≤ i < j ≤ n, let Zi ,j be the ordered set of key{zi , zi+1, . . . , zj} (with zi the smallest).• Note |Zi ,j | = j − 1 + 1• Therefore choosing u.a.r. a pivot is done with probability

1

|Zi ,j |=

1

j − 1 + 1

.

Page 19: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Define the indicator r.v.:

Xij =

{1 if zi is compared to zj ,

0 otherwise.

Then, X =∑n−1

i=1

∑nj=i+1 Xi ,j

(this is true because we never compare a pair more than once)

E [X ] = E

n−1∑i=1

n∑j=i+1

Xi ,j

=n−1∑i=1

n∑j=i+1

E [Xi ,j ]

As E [Xi ,j ] = 0Pr [Xi ,j = 0] + 1Pr [Xi ,j = 1]

∴ E [Xi ,j ] = Pr [Xi ,j = 1] = Pr [zi is compared to zj ]

Page 20: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

End of the proof and main theorem

E [X ] =∑n−1

i=1

∑nj=i+1 E [Xi ,j ]Pr [zi is compared to zj ]

As zi and zj compare iff one of them is chosen as pivot, then

Pr [Xi ,j ] = 1 = Pr [zi is pivot] + Pr [zj is pivot]

Because pivots as chosen u.a.r. in Zi ,j :

Pr [zi is pivot] = Pr [zj is pivot] = 1j−1+1

Therefore:

E [X ] =n−1∑i=1

n∑j=i+1

2

j − i + 1.

Page 21: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

E [X ] =n−1∑i=1

n∑j=i+1

2

j − i + 1

= 2 ·n∑

i=1

(1

2+

1

3+ · · ·+ 1

n − i + 1)

< 2 ·n∑

i=1

(1

2+

1

3+ · · ·+ 1

n)

= 2 ·n∑

i=1

Hn = 2 · n · Hn = O(n lg n).

Therefore, E [X ] = 2n ln n + Θ(n).

TheoremThe expected complexity of Ran-Quicksort is E [Tn] = O(n lg n).

Page 22: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Selection and order statistics

Problem: Given a list A of n of unordered distinct keys, and ai ∈ Z, 1 ≤ i ≤ n, select the element x ∈ A that is larger thanexactly i − 1 other elements in A.

Notice if:

1. i = 1 ⇒ MINIMUM element

2. i = n ⇒ MAXIMUM element

3. i = bn+12 c ⇒ the MEDIAN

4. i = b0.9 · nc ⇒ order statistics

Sort A (O(n lg n) and search for A[i ] (Θ(n)).

Can we do it in linear time?Yes, Selection is easier than Sorting

Page 23: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Quick-Select

Given unordered A[1, . . . , n] return the i-th. element

I Quick-Select (A[p, . . . , q], i)

I r = Ran-Partition (p, q) to findposition of pivot

I if i = r return A[r ]

I if i < r Quick-Select(A[p, . . . , r − 1], i)

I else Quick-Select(A[r + 1, . . . , q], i) 3

A

1 8

Search for i=2 in A

m u h e c b k v

3=Ran−Partition(1,8)

he c b u v k m

1

Page 24: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Quick-Select Algorithm

Quickselect (A[p, . . . , q], i)if p = q then

return A[p]else

r =Ran-Partition (A[p, . . . , q])k = r − p + 1if i = k then

return A[q]if i < k then

return Quickselect (A[p, . . . , q − 1], i)else

return Quickselect (A[q + 1, . . . , r ], i − k)end if

end ifend if

Page 25: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Analysis.

I Lucky: at each recusrsive call the search space is reduced in9/10 of the size. Then T (n) ≤ T (9n/10) + Θ(n) = Θ(n).

I Unlucky: T (n) = T (n − 1) + Θ(n) = Θ(n2). In this case it isworst than sorting!.

TheoremGiven A[1, . . . , n] and i , the expected number of steps forQuick-Select to find the i-th. element in A is O(n)

Page 26: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

ProofGiven A[1, . . . , n] let T (n) be a rv counting the expected numberof steps for Quick-Select to find the ith element.Quick-Select (A, i) returns the k-th. element with probability 1

|A| .Define the indicator rv:

Xij =

{1 if subarray |A| = k ,

0 otherwise.

Therefore, E [Xk ] = 1n

To get an UB on E [T (n)] assume thedesired i-th element always fells in thelargest side of the partition.When Xk = 1 we have subarrays of sizek − 1 and n − k.We get the recurrence:

m−k

k=Ran−Partition(A)

k

k−1

T (n) ≤n∑

k=1

XkT (max{k − 1, n − k}) + O(n)

Page 27: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Proof (cont.)

E [T (n)] ≤ E

[n∑

k=1

XkT (max{k − 1, n − k})

]+ O(n)

=n∑

k=1

E [XkT (max{k − 1, n − k})] + O(n)

=n∑

k=1

E [Xk ]E [T (max{k − 1, n − k})] + O(n)?

=1

n

n∑k=1

E [Xk ]E [T (max{k − 1, n − k})]

Notice max{k − 1, n − k} =

{k − 1 if k > n/2,

n − k otherwise.

E [T (n)] = 1n

∑n−1k=1 E [T (k)] + O(n) = O(n)

Page 28: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Deterministic linear selection.

Generate deterministically a good split element x .Divide the n elements in bn/2c groups, each with 5 elements (+possible one group with < 5 elements).

Page 29: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Deterministic linear selection.

Sort each set to find its median, say xi . (Each sorting needs 6comparisons, i.e. Θ(1)) Total: d6n/2e

Page 30: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Deterministic linear selection.

• Use recursively Select to find the median x of the medians{xi}, 1 ≤ i ≤ d6n/2e.• Use deterministic Partition (quick sort) to re-arrange the groupscorresponding to medians {xi} around x , in linear time on thenumber of medians.

x

Page 31: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Deterministic linear selection.

Al least 32bn/5c = b3n/10c of the elements are ≤ x .

x

Page 32: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Deterministic linear selection.

Al least 32bn/5c = b3n/10c of the elements are ≥ x .

x

Page 33: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

The deterministic algorithm

Select (A, i)1.- Divide the n elements into bn/5c groups of 52.- Find the median by insertion sort, and take

the middle element3.- Use Select recursively to find the median x of the bn/5c

medians4.- Use Partition to place x and its group. Let k=rank of x5.- if i = k thenreturn x

else if i < k thenuse Select to find the i-th smallest in the left

elseuse Select recursively to find the i − k-th smallest in the right

end if

Notice steps 4 and 5 are the same as Quickselect.

Page 34: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Example

Get the mean (bn/2c) on the following input:

call SELECT on this instance

3 13 9 4 5 1 15 12 10 2 6 14 8 11 17

3

5

4

9

13

1

2

15

12

10

17

14

11

8

6

PARTITION around 10:

13 12 15 11 14 17103 4 5 9 1 2 6 8

To get the 7th element (mean)

Page 35: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Worst case Analysis.

I As at least ≥ 3n10 of the elements are ≥ x .

I At least 3n10 elements are < x .

I In the worst case, step 5 calls Select recursively ≤ 7n/10

I Steps 1, 2 and 4 take O(n) time. Step 3 takes time T (n/5)and step 5 takes time ≤ T (7n/10).

so we have

T (n) =

{Θ(1) if n ≤ 50

T (n/5) + T (7n/10) + Θ(n) if n > 50

Therefore, T (n) = Θ(n)

Page 36: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Notice: If we make groups of 7, the number of elements ≥ x is 2n7 ,

which yield T (n) ≤ T (n/7) + T (5n/7) + O(n) with solutionT (n) = O(n).However, if we make groups of 3, thenT (n) ≤ T (n/3) + T (2n/3) + O(n), which has a solutionT (n) = O(n ln n).

Page 37: Divide-and-conquer - UPC Universitat Politècnica de Catalunyamjserna/docencia/grauA/T15/DandC.pdf · The divide-and-conquer strategy. 1.Break the problem into smaller subproblems,

Conclusions

I From a randomized algorithm we remove the randomizationto get a fast deterministic algorithm for selection.

I From the practical point of view, the deterministic algorithmis slow. Use Quickselect.