cse 101...algorithm design and analysis miles jones mej016@eng.ucsd.edu office 4208 cse building...

Algorithm Design and Analysis

Miles Jones

mej016@eng.ucsd.edu

Office 4208 CSE Building

Lecture 22: Dynamic Programming Examples (Edit Distance/Knapsack)

CSE 101

EDIT DISTANCE

Given two words (strings), how can we define a notion of “closeness”

For example:

Is PELICAN closer to PENGUIN or POLITICIAN?

EDIT DISTANCE (DEFINITION)

We can keep track of how many “changes” we need to change one

word into another.

The changes can be

▪ insertion,

▪ deletion, or

▪ substitution.

For example, if we line up the words PELICAN and OSTRICH

P E L I C A N

O S T R I C H

s s s s s s s

Is 7 the cheapest cost?

P E L I C A N

O S T R I C H

s s s s s s s

Is 7 the cheapest cost?

P E L I C A N

O S T R I C H

s s s s s s s

P E L - I C A N

O S T R I C - H

s s s i d s

Is 4 the cheapest?

P E L I C A N

P E N G U I N

s s s s

Is 7 the cheapest?

P E L I C A N - - -

P O L I T I C I A N

s s s s i i i

EDIT DISTANCE (BRUTE FORCE)

Brute force: try all possible combinations and find the minimum cost

of all of them.

What is the lower bound of the number of possible combinations if the

size of the words are n,m, n≤m?

EDIT DISTANCE (BRUTE FORCE)

Brute force: try all possible combinations and find the minimum cost

of all of them.

Lower bound of the number of possible combinations if the size of the

words are n,m, n≤m.

Each column could be one of three things (at least for the first n

columns.) So there are at least 3𝑛 different combinations. And really

there are much more than that!!!!!

EDIT DISTANCE (DYNAMIC PROGRAMMING)

Find the minimum cost of two words x[1…n] and y[1…m]

Step 1: Define subproblems:

Step 2: base case

Step 3: express recursively

Step 4: order the subproblems

▪ Let E(i,j) be the minimum cost of the two words x[1… i] and y[1…j].

Step 2: base case

What are the base cases?

When the first word is empty then the edit distance is the length of the second

word and when the second word is empty the edit distance is the length of the

first word.

E(i,0)=i

E(0,j)=j

▪ Let E(i,j) be the minimum cost of the two words x[1… i] and y[1…j].

Step 2: base case

▪ E(i,0) = i

▪ E(0,j) = j

Step 3: express recursively:

▪ Split into cases depending on the last column of the alignment of x[1… i] and y[1…j].

▪ Case 1: the last column looks like

▪ This is a deletion with a cost of 1 so if the minimum cost of x[1… i] and y[1…j] has

this in the last column then…..

▪ This is a deletion with a cost of 1 so if the minimum cost of x[1… i] and y[1…j] has

this in the last column then

▪ E(i,j)=1+E(i-1,j)

▪ This is an insertion with a cost of 1 so if the minimum cost of x[1… i] and y[1…j] has

this in the last column then….

▪ This is an insertion with a cost of 1 so if the minimum cost of x[1… i] and y[1…j] has

this in the last column then

▪ E(i,j)=1+E(i,j-1)

▪ Case 3: the last column looks like x[i]

▪ Case 3.1: x[i]=y[j] (no cost)

▪ E(i,j)=E(i-1,j-1)

▪ Case 3.2: x[i]≠y[j] (substitution cost of 1)

▪ E(i,j)=1+E(i-1,j-1)

▪ So take the minimum of all three cases

▪ E(i,j)=min( 1 + E(i-1,j), 1 + E(i,j-1), (1-𝛿𝑥 𝑖 ,𝑦[𝑗])+E(i-1,j-1))

▪ (delta function 𝛿𝑎,𝑏 = ቊ0 𝑖𝑓 𝑎 ≠ 𝑏1 𝑖𝑓 𝑎 = 𝑏

Step 4: ordering…..

▪ To calculate E(i,j), we need to know E(i-1,j), E(i,j-1) and E(i-1,j-1)

▪ Think of a 2-d array and where are the indices in relation to ( i,j)?

▪ So, order them in such a way to visit all the necessary entries before you visit (i,j).

▪ One way to do this is left to right through rows going from top to bottom.

(i-1,j-1) (i-1,j)

(i,j-1) (i,j)

EditDist(x[1…n],y[1…m])

Initialize for i from 1 to n, E(i,0)=i and for j from 1 to m, E(0,j)=j

for i from 1 to n

for j from 1 to m

E(i,j)=min( 1 + E(i-1,j), 1 + E(i,j-1), (1-𝛿𝑥 𝑖 ,𝑦[𝑗])+E(i-1,j-1))

EDIT DISTANCE (EXAMPLE)∅ P E L I C A N

TABULATION/MEMOIZATION

THE KNAPSACK PROBLEM

Suppose you are a burglar who breaks into a store and

you want to leave with the maximum value of items.

Your knapsack can only hold 13 lbs and the items in the store are:

Value 4 9 12 15 19 21

Weight 2 4 5 7 8 9

What is the maximum value you can have from a list

of items a[1],…,a[n] where each item has a value v[ i] and a weight w[i]

given that you cannot have more weight than W.

Step 1: subproblems:

Let K(w) be the maximum value you can have in a w-capacity

knapsack.

Step 2: base cases:

knapsack.

Let K(w) be the maximum value you can have in a w-capacity knapsack.

What is K(w)? Take away the weight of each item and see what value it

is if you add that item.

𝐾 𝑤 = max𝑖 :𝑤[𝑖]≤𝑤

𝐾 𝑤 − 𝑤 𝑖 + 𝑣 𝑖

𝐾 𝑤 = max𝑖 :𝑤[𝑖]≤𝑤

𝐾 𝑤 − 𝑤 𝑖 + 𝑣 𝑖

Step 4: order.

Then order the subproblems from 1 to W.

pseudocode:

Knapsack(v[1…n],w[1…n],W)

K(0):=0

prev(0):=nil

for w from 1 to W

K(w):=0

for j from 0 to n

if K(w)<K(w-w[j])+v[j] then

K(w):= K(w-w[j])+v[j]

prev(w):=j

return K(W)

Runtime???

THE KNAPSACK PROBLEMValue 4 9 12 15 19 21

Weight 2 4 5 7 8 9

v[i] w[i] 0 1 2 3 4 5 6 7 8 9 10 11 12 13

THE KNAPSACK PROBLEM (NO REPEATS)

Suppose you are a burglar who breaks into somebody’s

house where there is only one item of each.

You want to leave with the maximum value of items but you can’t take

more than one of each thing.

Your knapsack can only hold 13 lbs and the items in the house are:

Value 4 9 12 15 19 21

Weight 2 4 5 7 8 9

Let K(w,j) be the maximum value you can have in a w-capacity

knapsack using only the items a[1],…a[j]

of items a[1],…,a[n] where each item has a value v[ i] and a weight

Let K(w,j) be the maximum value you can have in a w-capacity

knapsack using only the items a[1],…a[j]

What is K(w,j)?

Case 1: taking a[j] is better value:

add the item a[j] to a knapsack with max cap w-w[j] and add the value of j.

𝐾 𝑤, 𝑗 = 𝐾 𝑤 − 𝑤 𝑗 , 𝑗 − 1 + 𝑣[𝑗]

Case 2: taking a[j] is not better value.

𝐾 𝑤, 𝑗 = 𝐾 𝑤, 𝑗 − 1

So take the maximum of these two scenarios.

𝐾 𝑤, 𝑗 = max {𝐾 𝑤 − 𝑤 𝑗 , 𝑗 − 1 + 𝑣 𝑗 , 𝐾 𝑤, 𝑗 − 1 }

𝐾 𝑤, 𝑗 = max{𝐾 𝑤 − 𝑤 𝑗 , 𝑗 − 1 + 𝑣 𝑗 , 𝐾 𝑤, 𝑗 − 1 }

Step 3: order and base cases.

Base cases:

K(0,j)=0 for all j

K(w,0)=0 for all w

We need to know (𝑤 − 𝑤 𝑗 , 𝑗 − 1) and 𝐾 𝑤, 𝑗 − 1 before computing

K(w,j)

Order the problems left to right top to bottom.

(𝒘, 𝒋 − 𝟏)

(𝑤 − 𝑤[𝑗], 𝑗 − 1) (𝑤, 𝑗)

Weight 2 4 5 7 8 9

v[i] w[i] 0 1 2 3 4 5 6 7 8 9 10 11 12 13

TABULATION/MEMOIZATION

Weight 2 4 5 7 8 9

v[i] w[i] 0 1 2 3 4 5 6 7 8 9 10 11 12 13

pseudocode:

Knapsack(v[1…n],w[1…n],W)

K(0,j):=0 for all j

K(w,0):=0 for all w

prev(0):=nil

for w from 1 to W

for j from 0 to n

𝐾 𝑤, 𝑗 = max{𝐾 𝑤 − 𝑤 𝑗 , 𝑗 − 1 + 𝑣 𝑗 , 𝐾 𝑤, 𝑗 − 1 }

return K(w,n)

Runtime???

cse 101...algorithm design and analysis miles jones mej016@eng.ucsd.edu office 4208 cse building...

Documents

dynamic programming and edit distance

07 edit distance dtw

chapter regular expressions, text normalization, edit...

tree edit distance analysisstelo/cpm/cpm03/touzet.pdf ·...

cse dhf hendra salim-edit

learning stochastic edit distance: application in...

an e cient algorithm for computing the edit distance of a...

edit distance solution

l06 stemmer and edit distance

edit distance problem

nlp (fall 2013): levenshtein edit distance & skip trie

minimum&edit& distance - stanford...

precomputing edit-distance specificity of short...

on recursive edit distance kernels with application to...

dynamic programming: edit distancean introduction to...

distance reduction edit

cse 461: distance vector routing

tree edit distance1 tree edit distance. minimum edits to...

dynamic programming: edit distance bioinformatics: issues...

skiena algorithm 2007 lecture17 edit distance