simple and fast linear space computation of longest common subsequences claus rick, 1999
Post on 21-Dec-2015
230 Views
Preview:
TRANSCRIPT
Simple and fast linear space computation of Longest common subsequences
Claus Rick, 1999
What is the LCS problem?
A A B A C
A B C
…Finding a sequence of greatest possible length that can be obtained From both A and B by deleting zero or more (not necessarily adjacent) symbols.
Some boring history…Year Author Time Constants Paradigm
1975 Hirschberg O(mn) 2 Dyn. Prog.
1985 Apostolico, Guerra O(mLgm+pm) [2,logm] contours
1986 Myers O(n(n-p)) 2 Shortest path
1987 Kumar, Rangan O(n(m-p)) 3 contours
1990 Wu et al. O(n(m-p)) 2 Shortest path
1992 Apostolico, et al. O(n(m-p)) 3 contours
1992 Apostolico, et al. O(pm) 3 contours
1999 Goeman, Clausen O(min(pm, mLgm + p(n-p)])
[5,25,lgM] contours
1999 This article O(min(pm,p(n-p)]) 2 contours
Pre-Info
Divide and conquer Midpoint
Some basic terms
Ordered Pair (i,j)
A A B A C
A B C
(2,3)= (A,C)
Some basic terms
Match
A A B A C
A B C
Some basic terms
Chain
A A B A C
A B C
Rank k
A A B A C
A B C
Some basic terms
c b a b b a c a cabacbcba
Matching Matrix
Some basic terms
Dominant matches
All Upper-left matches in each rank
c b a b b a c a cabacbcba
Dominant matches
1
2
3
4
5
A A B A C
A B C
c b a b b a c a cabacbcba
c b a b b a c a c
abacbcba
Backward contours (BC)
1
2
3
4
5
Some last basic terms
FCk
BCk
c b a b b a c a cabacbcba
1
2
3
4
5
Forward contours (FC)
c b a b b a c a c
abacbcba
Backward contours (BC)
1
2
3
4
5
Let p be the length of an LCS between strings A and B. Then for every match (i,j) the following holds:
•There is an LCS containing (i,j) if and only if (i,j) is on the kth forward contour and on the (p-k+1)st backward contour.
Lemma 1
Lemma 1- proof
|BC|- (p-k+1)|FC|= (k)
P
P
K <(p-k+1)<(p-k+1)
Start calculating
FC1 BC1 FC2 BC2
Sooner or later…
Really really last terms
Define sets Mi as:
M0= M
M1= M0\FC1
M2= M1\BC1
M2i-1=M2(i-1) \FCi
M2i=M2i-1\BCi
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
M
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
M1M2M3M4M5
Let call the first empty Mi….
M p’
Lemma 2
The Length of an LCS is p’ and each match in M(p’-1) is a possible midpoint
Lemma 2- proof
K
M 0
K-1K-210
M 2M 1M k-1M kK=p
Little problem…
We can`t keep tracks of each set- very expensive
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
What do we do?
Keep only dominant matches…
When we see a dominant match below- done.
c b a b b a c a cabacbcba
abacbcba
c b a b b a c a c
Lets define:
FCf’ , BCb’ the minimal indices as stated above
Lemma 3
The Length of an LCS is b’ + f’ -1.
Complexity
Finding the dominant matches each contour:
O(min(m, (n-p))
Number of contours:
P
O(Min(pm, p(n-p)
The End
Simple and fast linear space computation of longest common subsequence
Written by: Claus Rick,1999
Based on algorithm by:D.Hirschberg, 1975
Cast:
MatricesLines
ArrowsSquares
Blue Red
BrownGreyBlack
String AString B
Presentation: Uri Scheiner
No Dominant Matches were harmed during the making of this presentation
Appendix
What is the LCS
Divided And Conquer
Match
Chain
Dominant Matches
FC
BC
Lemma 1
Define M…
Lemma 2
Keep just Dominant…
Lemma 3
Complexity
top related