dynamic programming, greedy practice -...

19
1 Dynamic Programming, Greedy - Practice Note that some problems are not covered every semester (e.g. Stair Climbing with or without Health points). If a problem is not covered this semester, it will not be part of the exam. Updated with corrections: JS2 (4/5/19) Book (CLRS) problems: 1. Dynamic programming: 15.4-1, 15.4-2, 15.4-3 and 15.4-5 (here justify the complexity of all steps you take). (page 396) 2. Greedy: 16.1-3 (page 422) Other sources: Problems/issues discussed in class, Homework problems, Blackboard quizzes. Types of problems: 1) Given solution table partially filled out, finish filling it out. 2) Given the gain/cost solution, recover the solution choices that gave this optimal value. 3) Time complexity for Mixed Problems & Techniques MIX1. Short answer: a) (6 pts) Name two optimization problems that are equivalent: exactly the same Dynamic Programming method (set-up and processing) is used to solve both of them. Unbounded Knapsack and Stair Climbing with Gain (Rod Cutting from the text book is a special version of these as well) b) (3pts) Queens is a Dynamic Programming problem. True/False. NOT for Spring 19 False c) (5 pts) What problem, discussed in class, used backtracking? (Note: here by backtracking we refer to a specific technique used to compute the solution to the problem, NOT to recover of the choices in a dynamic programming problem). (Your example should NOT be a dynamic programming problem). NOTE: The backtracking problem was NOT covered in Spring 19 at the time this problem was posted. Queens d) True/False for any DP problem, we can only recover the choices that give the optimal solution AFTER we computed the actual optimal value. True

Upload: doankhuong

Post on 28-Apr-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

1

Dynamic Programming, Greedy - Practice

Note that some problems are not covered every semester (e.g. Stair Climbing with or without Health points). If a problem

is not covered this semester, it will not be part of the exam.

Updated with corrections: JS2 (4/5/19)

Book (CLRS) problems:

1. Dynamic programming: 15.4-1, 15.4-2, 15.4-3 and 15.4-5 (here justify the complexity of all steps you take).

(page 396)

2. Greedy: 16.1-3 (page 422)

Other sources: Problems/issues discussed in class, Homework problems, Blackboard quizzes.

Types of problems: 1) Given solution table partially filled out, finish filling it out.

2) Given the gain/cost solution, recover the solution choices that gave this optimal value.

3) Time complexity for

Mixed Problems & Techniques MIX1. Short answer:

a) (6 pts) Name two optimization problems that are equivalent: exactly the same Dynamic Programming

method (set-up and processing) is used to solve both of them.

Unbounded Knapsack and Stair Climbing with Gain

(Rod Cutting from the text book is a special version of these as well)

b) (3pts) Queens is a Dynamic Programming problem. True/False. – NOT for Spring 19 False

c) (5 pts) What problem, discussed in class, used backtracking? (Note: here by backtracking we refer to a

specific technique used to compute the solution to the problem, NOT to recover of the choices in a dynamic

programming problem). (Your example should NOT be a dynamic programming problem). – NOTE: The

backtracking problem was NOT covered in Spring 19 at the time this problem was posted. Queens

d) True/False for any DP problem, we can only recover the choices that give the optimal solution AFTER we

computed the actual optimal value. True

2

Edit Distance (ED), Longest Common Subsequence (LCS), Longest Increasing

Subsequence (LIS)

EDMix1. Short answer:

a) (5 pts) Give the function for computing the longest common subsequence (LCS): D[i][j] = …. You can use

max/min functions on however many arguments. You can write the function as a math function or as code.

Assume i is the index for string X = x1 x2 x3… xM and j is the index for string Y = y1 y2 y3… yN .

𝐷[𝑖][0] = 𝐷[0][𝑗] = 0

𝐷[𝑖][𝑗] = {1 + 𝐷[𝑖 − 1][𝑗 − 1], 𝑥𝑖 == 𝑦

𝑗

max{𝐷[𝑖 − 1][𝑗], 𝐷[𝑖][𝑗 − 1]}, 𝑥𝑖 ≠ 𝑦𝑗

b) (6 pts) Give the math function (or code piece) for computing the cell Dist[i][j] for the Edit Distance between

two strings X and Y. Assume that in the table, X will be vertical and Y will be horizontal. Assume the

index of the first letter in a string is 0 (not 1). (Same question applies to all DP problems covered in class.)

𝐷[𝑖][0] = 𝑖

𝐷[0][𝑗] = 𝑗

𝐷[𝑖][𝑗] = {max{𝐷[𝑖 − 1][𝑗] + 1, 𝐷[𝑖][𝑗 − 1] + 1, 𝐷[𝑖 − 1][𝑗 − 1]}, 𝑥𝑖 == 𝑦𝑗max{𝐷[𝑖 − 1][𝑗] + 1, 𝐷[𝑖][𝑗 − 1] + 1, 𝐷[𝑖 − 1][𝑗 − 1] + 1}, 𝑥𝑖 ≠ 𝑦𝑗

c) (5 pts) This is meant to be an edit distance table, but something is wrong. What is the problem?

The highlighted cell cannot be 3. It should be at most 2 (diag+1= left +1 =2).

d) (4 pts) In the table below, the highlighted value should be 2 instead of 3, because it calculates the

minimum of 2,3,4 when the letter is the same: o. True/False. EXPLAIN.

| | s| o| m| e| t| h| i| n| g|

-------------------------------------------

| 0| 1| 2| 3| 4| 5| 6| 7| 8| 9| False.

------------------------------------------- The letter is used ONLY for diagonal:

s| 1| 0| 1| 2| 3| 4| 5| 6| 7| 8| 3+0 = 3

------------------------------------------- When coming from the top: 2+1 = 3

o| 2| 1| 0| 1| 2| 3| 4| 5| 6| 7|

-------------------------------------------

m| 3| 2| 1| 0| 1| 2| 3| 4| 5| 6|

-------------------------------------------

e| 4| 3| 2| 1| 0| 1| 2| 3| 4| 5|

-------------------------------------------

o| 5| 4| 3 | 2| 1| 1| 2| 3| 4| 5| -------------------------------------------

n| 6| 5| 4| 3| 2| 2| 2| 3| 3| 4|

-------------------------------------------

0 1 2 3

1 1 3 4

2 2 2 4

3 3 3 3

3

e) (3 pts) In the arrow table for the edit distance, an arrow pointing left (<) shows:

A) A letter from the top string matched with a letter from the vertical string

B) Insertion in the top (horizontal) string

C) Insertion in the left (vertical) string

D) None of these

f)

LIS 2. (8pts) In class we discussed how we can solve the Longest Increasing Subsequence (LIS) problem by

reducing it to another problem.

a) (3pts) What other type of problem did we reduce it to? Longest Common Subsequence (LCS)

b) (5pts) Assume the specific instance you need to solve is 7, 1, 3, 6, 4, 1, 2, 3, 0, 6, and you allow repetitions in

the increasing sequence (it is NOT strictly increasing: 3,3,6 is allowed). Give the reduction: produce a problem

instance of the other problem type. You do NOT need to solve, it, just give the new instance problem

corresponding to the given LIS problem.

Make a copy, Y, of the sequence X.

Sort Y in increasing order.

Find the LCS between X and Y. That is the answer: LIS(X) = LCS(X,Y)

4

ED2. (6pts) Fill out the last two rows in the table below:

N O N S T O P

0 1 2 3 4 5 6 7

R 1 1 2 3 4 5 6 7

O 2 2 1 2 3 4 5 6

U 3 3 2 2 3 4 5 6

N 4 3 3 2 3 4 5 6 D 5 4 4 3 3 4 5 6

ED3. (6pts) Fill-out the last two rows in the edit distance table below. (Here STEM is the complete second

word. ME is the end of the first word. For example the first word could be NAME, TESTME).

S T E M

… … … … … …

… 3 3 2 2 3

M 4 4 3 3 2

E 5 5 4 3 3

ED4. Given the edit distance table (with the cost or the distance), add the retracing arrows for:

Answer: Assume N = length of vertical string, and M = length of horizontal string.

a. all cells (this is a bit long for an exam, but may still be asked for a small example). Give the time and space

complexity for this process. Ans: Space: Θ(1), Time: Θ(M*N),

b. only the cells visited to recover the solution choices. Give the time and space complexity for this this

process. Ans: Space: Θ(1), Time: O(M+N), Ω(max(M,N))

c. the last two rows. Give the time and space complexity for this process. Ans: Space: Θ(1), Time: Θ(M)

N O N S T O P

0 1 2 3 4 5 6 7

R 1 1 2 3 4 5 6 7

O 2 2 1 2 3 4 5 6

U 3 3 2 2 3 4 5 6

N 4 3 3 2 3 4 5 6

D 5 4 4 3 3 4 5 6

5

ED5. (17 pts) Given the table:

| | s| o| m| e| t| h| i| n| g|

-------------------------------------------

| 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|

-------------------------------------------

s| 1| 0| 1| 2| 3| 4| 5| 6| 7| 8|

-------------------------------------------

o| 2| 1| 0| 1| 2| 3| 4| 5| 6| 7|

-------------------------------------------

m| 3| 2| 1| 0| 1| 2| 3| 4| 5| 6|

-------------------------------------------

e| 4| 3| 2| 1| 0| *1| 2| 3| 4| 5|

-------------------------------------------

o| 5| 4| 3| 2| 1| *1| *2| 3| 4| 5|

-------------------------------------------

n| 6| 5| 4| 3| 2| 2| 2| 3| 3| 4|

-------------------------------------------

e| 7| 6| 5| 4| 3| 3| 3| 3| 4| 4|

-------------------------------------------

(5pts) Mark the path used to recover the alignment with a CIRCLE around each cell from the path. If two

directions give the (same) optimal cost, give preference in this order: diagonal, left, up.

(3pts) Show all 3 strings: the 2 strings that show the word alignments and the 3rd

one showing the cost.

something

some--one

000011101

(3pts) How many different path are there that give the (same) OPTIMAL cost?____3__________

(4pts) Show (on the table) at least one other path that gives the optimal cost. Mark with a TRIANGLE the cells

that are part of the new path. Cells that are part of both paths, will have a circle and a triangle.

Alternative path is indicated by the italic and underlined numbers.

(2pts) Give the alignment resulting from that path. Show all 3 strings as above.

Path from above Third path (uses the *1 with yellow highlight) something something

someo--ne some-o-ne

000011101 000011101

ED6. (10 pts) Using the edit distance table below, recover the alignment:

| | A1 | A2| A3 | A4| A5 | --------------------------- | <| <| <| <| <| <| --------------------------- B1| ^| \| \| \| <| <| --------------------------- B2| ^| \| \| \| \| \| --------------------------- B3| ^| \| \| \| \| \| --------------------------- B4| ^| ^| \| <| \| <| --------------------------- B5| ^| ^| ^| \| <| \| --------------------------- B6| ^| ^| ^| \| \| \|

6

---------------------------

(3 pts) Circle the path in the table above.

(7 pts) Fill-in the string alignment in the table below:

A string - - A1 A2 A3 A4 A5

B string B1 B2 B3 B4 B5 - B6

cost string: 0 or 1 1 1 0/1 0/1 0/1 1 0/1 Answering 0 or 1 (0/1) for the letter pairs is a sufficient answer (the table part).

Extra discussion: At least one pair cost must be 0, but probably 2 or 3 are. Because of the ‘change’ in pattern the

underscored 0 are probably the correct answer.

ED7. (18 pts) On the right is part of an edit distance table. CART is

the complete horizontal string. AL is the end of the vertical string (the

first letters of this string are not shown).

a) (6 points) Fill-out the empty rows (finish the table).

b) (4 points) How many letters are missing from the vertical string

(before AL)? Justify your answer. Answer: 5. All missing vertical

letters are matched to “” therefore they are as many as the cost, 5,

for the vertical letter right above A: 5.

c) (8 points) Using the table and the information from part b), for each of the letters C and A in the horizontal

string, CART, say if it could be one of the missing letters of the vertical string: Yes (it is one of the missing

letters – ‘proof’), No (it is not among the missing ones – ‘proof’), Maybe (it may or may not be among the

missing ones – give example of both cases).

Yes/No /Maybe

CLEAR justification as explained above. (Each justification that does not make sense will get a penalty of one point.)

C

No Definitely no. Cost to align the first 5 letters of the vertical string with C is the same (5) as the cost to align them with the empty string. Therefore there was no matching letter for C (or else the cost would have been 4, not 5).

A

Yes Definitely yes. Same reasoning as above, but here the cost is 1 less, therefore there is one A among the first 5 letters of the vertical strings.

C A R T

… … … … … …

… 5 5 4 3 3

A 6 6 5 4 4

L 7 7 6 5 5

7

Knapsack: 0/1 or unbounded, fractional or not. Dynamic Programing & Greedy

NOTE: Unless otherwise specified, any Knapsack problem is assumed to NOT be fractional: items can be

picked up as the whole item or not at all.

K1. Short answer:

a) (3pts) The time complexity for the ITERATIVE solution for the 0/1 Knapsack of max capacity N and d

items is: is Θ(__N*d____)

b) (3pts) What information is used to recover the items that give the optimal solution for the 0/1 Knapsack:

A) a star B) an arrow C) the Picked array D) None of these.

c) (5 pts) Give the function for computing the cell sol[i][j] for the 0/1 Knapsack. Assume that i is the index of

the item. You can use value[i] and weight[i] to refer to the value and the weight of item i.

sol[i][j] = min{ sol[i-1][j], sol[i-1][ j-weight[i] ]+value[i] }

d) (5 pts) Is there any Knapsack version for which Greedy (with the ratio value to weight) gives an optimal

solution for? Yes: The fractional Knapsack ( both 0/1 and Unbounded versions)

e)

8

K2. Fill out the remaining of the solution table (for columns 13 and 14). You must show your work in the

rows for A,B,C,D as shown in class (even though for the provided solution (columns 0 to 12) the work is not

shown in rows A,B,C,D):

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sol: 0 0 0 0 10 10 21 21 21 21 33 33 42 42 43

Picked A A B B B B C C B B A

A,4,10 9,31 10,43

B,6,21 7,42 8,42

C,10,33 3,33 4,43

D,12,36 1,36 2,36

K3. (6 pts) Given this solution information, for the unbounded Knapsack problem below, recover the choices

that gave the optimal answer for Knapsack capacity 14. (This is a made-up example.) Show your work.

Item: C A B

Weight: 2 3 4

Item value is not given as it should not be used to recover the solution.

Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

picked - - C A B B A B B C A C B A A

Items picked for capacity 14: __A, C, C, B, A (jump back in the solution based on weight of picked item)

K4. (37pts) a)(10pts) Solve the UNBOUNDED Knapsack problem below. Show work as in class.

Item A B C

Weight 2 3 5

Value 3 4 8

0 1 2 3 4 5 6 7 8 sol 0 0 3 4 6 8 9 11 12 picked - - A B A C A A A A, 2, 3 - - 0, 3 1, 3 2,6 3,7 4,9 5,11 6,12 B, 3, 4 - - - 0, 4 1,4 2,7 3,8 4,10 5,12 C, 5, 8 - - - - - 0,8 1,8 2,11 3,12

b) (3 pts) Recover the items in the solution:

A, A, A,A

9

c) (5 pts) What is the size of the sliding window for this problem? Justify your answer. Give the problem sizes

for the problems stored in the sliding window (needed to solve problem of size N and to continue solving next

problem sizes).

Sliding window size is 5 because the smallest problem needed to solve for N is N-5.

Problem sizes to be stored: N-5,N-4,N-3,N-2,N-1

d) (5 pts) How do you update the array of solutions to the smaller problems, after you have solved for problem

of size N, so that it is now ready to be used to solve a problem of size N+1? Show with a drawing. (If you prefer

you can also write code, but the drawing is still mandatory).

0

(N-5)

1

(N-4)

2

(N-3)

3

(N-2)

4

(N-1)

d) (8 pts) Consider the FRACTIONAL 0/1 Knapsack problem with the same items as above and max capacity

9. What will Greedy based on VALUE (not ratio) do for this problem? Show your work.

Total value for the Greedy solution: __8+4+(1/2)*3 = 13.5_______

Items sorted from most valuable to least: C (5,8), B(3,4),A(2,3)

Items picked: C, B, ½ of A

Picked C B 1/2 of A

Remaining weight 4 = (9-5) 1 = (4-3) 0 = [1- (1/2)*2]

e) (6 pts) What will Greedy based on the ratio (as discussed in class), do for a knapsack of size 14, if there was an

UNLIMITED number of each item and an item cannot be split (NOT fractional)? Show your work.

Total value for the Greedy solution: _8+8+3+3 = 22 _______

Items sorted from most valuable to least based on value/weight ratio: C (8/5=1.6), A(3/2=1.5), B(4/3=1.34)

Items picked: C, C, A, A

Picked C C A A

Remaining weight 9 = (14-5) 4 = (9-5) 2 = (4–2) 0 = (2-2)

K5. (14 pts) Use a Greedy method based on the largest value to solve a 0/1 Knapsack problem.

a) (6pts) Give the pseudocode for the algorithm (what it computes, what actions it takes).

Answer for

problem of size N

10

// A sufficient version for the exam:

L = list of items sorted in increasing order of value.

currentWeight = original weight

totalValue = 0

Repeat {

x = first item in L s.t. x->weight<=currentWeight

// if no item in L has weight <= currentWeight, x is NULL

If (x == NULL)

Break

Else {

x = curr // x is the first item that fits

currentWeight = currentWeight – x->weight

totalValue = totalValue + x->value

remove x from L // because 0/1 Knapsack

}

}

// A more detailed version:

L = list of items sorted in increasing order of value.

currentWeight = original weight

totalValue = 0

x = L // just to start the loop. It gets overwritten

while (x!=null):

x = null

curr = L->next // assume list has dummy node

while (curr != null && curr->weight >currentWeight)

curr = curr->next

if (curr != null)

x = curr // x is the first item that fits

currentWeight = currentWeight – x->weight

totalValue = totalValue + x->value

remove x from L // because 0/1 Knapsack

// else: L is empty or no item fit in

b) (8pts) Does it give an optimal solution or not? If Yes, justify, if No, prove it with a counter example.

No. Idea: if pick the item with most value, no other fits in. Greedy picks this. Optimal: have 2 other items of

smaller individual value, but larger when added and they both fit.

Use Knapsack capacity: 4.

Greedy can only pick A (remaining weight 1) total value: 5

Optimal solution: pick B and C. Total value: 8

Item A B C

Weight 3 2 2

Value 5 4 4

11

Job Scheduling (or Weighted Interval Scheduling): DP & Greedy

JS1. Short answer

a) Assume that the data has been preprocessed to the point where you have all the pi values for a Job

Scheduling problem with N jobs. What is the time complexity to solve this problem? Circle the tightest

asymptotic bound (O, Θ or Ω) that applies and fill in the function as well. O / Θ / Ω (…N……..)

b)

JS2. (12 pts) a) (10 pts) Use Dynamic Programming to solve the Weighted Job Scheduling below for jobs 1-6

with job values given by vi. Here pi gives the last job that does not overlap with job j. When filling in the

answer for m(i) show your work as done in class.

b) (2 pts) Recover the solution (fill in in the rightmost column). Jobs: 2,1,4,6 – corrected 4/5/19

i vi pi m(i) m(i) used i:

Y/N

In opt Solution: Y/N

0 0

1 4 0 4 (max of: 0,4) Y Y

2 3 0 4 (max of: 4,3) N Y

3 4 1 8 (max of: 4,4+4) Y

4 5 2 9 (max of: 8,4+5) Y Y

5 3 2 9 (max of: 9,4+3) N

6 2 4 11 (max of: 9,9+2) Y Y

JS3. (11pts) For a Job-Scheduling problem where each job has the same value (e.g. $10), your goal is to choose

the maximum number of non-overlapping jobs. You take the following approach:

- Sort the jobs in increasing order of START TIME

- Repeat until no job left:

o Pick the job,J, that starts first. Assume no two jobs have the same start time.

o From the remaining jobs, remove all the jobs that overlap with job J.

a) (3 pts) Is this Greedy, Dynamic Programming or Brute Force Search? Greedy

b)(8 pts) Does it give an optimal solution or not? If Yes, justify, if No, prove it with a counter example.

No. Idea: have two overlapping jobs, greedy picks first one to start but the other one gives more value.

Job Start End Value

1 1 3 2

2 2 5 5

Greedy picks job, total value: 2 .

Optimal solution job 2, total value: 5 .

12

JS4. () When we solved the Job Scheduling problem, the problem was already ‘preprocessed’ to lend itself to a

Dynamic Programing (DP) solution.

a) (5 pts) Fill in the 2 sentences below to say what the ‘preprocessing” does.

Step 1:_Reorder jobs in increasing order of end time (Job 1 is the first one to end, job N is the last one to end)

pi is _ the last job that ends before job i starts ___________________________________________

b) (6 pts) Given the jobs in the table to the right, do the preprocessing so

that the problem can be solved with DP. Solve up to and including the

pi.

Draw the picture and fill in the table below:

Start time End time Value

5 7 5

1 2 4

2 6 4

4 9 3

3 4 3

8 10 2

Job Start End Value pi

0 - - - -

1 1 2 4 0

2 3 4 3 1

3 2 6 4 1

4 5 7 5 2

5 4 9 3 2

6 8 10 2 4

1 2 3 4 5 6 7 8 9 10

1

2 3

4

5 6

13

Stair Climbing with or without Gain/Health Points

SC2. (10 pts) Solve the Stair Climbing with Health Points problem given below. Show work as in class. –

NOTE: this problem is not covered every semester.

Jump size 2 3 5

Health Pts 3 4 8

0 1 2 3 4 5 6 7 8

0 0 3 4 6 8 9 11 12

- - 2 3 2 5 2 2 2

2, 3 - - 0, 3 1, 3 2,6 3,7 4,9 5,11 6,12

3, 4 - - - 0, 4 1,4 2,7 3,8 4,10 5,12

5, 8 - - - - - 0,8 1,8 2,11 3,12

NOTE: The answer for this problem is the same for the Knapsack K4 (with items of eight equal to jump size

and value equal to Health Points).

14

Huffman encoding

H1. Short answer:

f) (3pts) Huffman coding is a Dynamic Programming problem. True/False. False

H2. Assume that the numbers given below represent counts of letters in the hundreds from a file (similar to the CLRS

example). For example, in the file there will be exactly 20 * 100 occurrences of the letter ‘a’ , 11*100 occurrence of the

letter ‘c’, etc. a: 20, c:11, d:2, e: 10, o:15, m:8, s:10, t:22, u: 2

a) What is an optimal Huffman code based on the following set of

frequencies?

1) Draw the tree. Show your work at every step.

2) Fill in the table on the right the Huffman encoding for each letter.

3) We encode the file using the Huffman codes produced above.

How much memory will the file require with this encoding?

b) Fixed-length encoding:

1) Fill in the table the fixed-length encoding for each letter.

2) What will be the file size when the fixed-length encoding is used?

Answer: see next pages.

H3. (5 pts) In 3 to 5 bullet points or SHORT sentences, write the algorithm for building the Huffman tree.

Graded both for correctness and CLARITY.

a) sort pair letter-frequency in increasing order of frequencies (they will be leaves in the tree)

b) repeat until it is all one tree

i) connect the leftmost two nodes as children of a new node. Value of new node = sum of the

connected nodes

ii) reorder (put new node in correct place)

Letter Encoding

Huffman Fixed length

a

c

d

e

o

m

s

t

u

15

16

17

18

19