longest common subsequence algorithm on asc processors using coterie network sabegh singh virdi asc...
Post on 18-Dec-2015
222 views
TRANSCRIPT
Longest Common Subsequence Algorithm on
ASC Processors using Coterie Network
Sabegh Singh Virdi
ASC Processor GroupComputer Science Department
Kent State University
Presentation Outline Introduction to String matching and its
variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie
Network Exact match Approximate match
Summary and Future work
String Matching One of the most fundamental operation in
computing. Comparing two linear arrays of character Application in bioinformatics, searching genetic
databases String involved are how ever enormous, efficient
string processing is therefore a requirement
String Matching Variations Is Exact match the only solution? What if the pattern does not occur in the text? It still makes sense to find the longest
subsequence that occurs both in the pattern and in the text. This is the longest common subsequence problem
Longest Common Subsequence, Longest Common Substring, Sequence alignment, Edit distance Problem are all variation of SM problem
Presentation Outline String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie
Network Exact match Approximate match
Summary and Future work
Role of LCS in Molecular biology
DNA sequences (genes) can be represented as sequences of four letters A, C, G, and T corresponding to the four submolecules forming DNA
When biologists find a new sequences, they typically want to know what other sequences it is most similar to
One way of computing how similar (homologous) two sequences are, is to find the length of their longest common subsequence
Role of LCS in Molecular biology
This is a simplification, since in the biological situation one would typically take into account not only the length of the LCS, but also e.g. how gaps occur when the LCS is embedded in the two original sequences.
An obvious measure for the closeness of two strings is to find the maximum number of identical symbols (preserving symbol order)
This by definition, is the longest common subsequence of the strings
Overview of LCS Algorithm
Given two strings, find the LCS common to both strings. Example:
String 1: AGACTGAGGTA String 2: ACTGAG
AGACTGAGGTA - -ACTGAG - - - list of possible alignments - -ACTGA - G- - A- -CTGA - G- - A- -CTGAG - - -
The time complexity of this algorithm is clearly O(nm);
Overview of LCS Algorithm Actually this time does not depend on the sequences u
and v themselves but only on their lengths The bottleneck in efficient parallelization of LCS
problem are the calculating the value of diagonal elements, as shown
As seen, the value of {i,j} depend upon the previous element {i-1,j-1}, when a match is found
Possibility of more then one LCS Associate some parameters The Smith-Waterman Algorithm uses the same
concept that of LCS algorithm, but gives us the optimal result
Overview of LCS Algorithm
Overview of LCS Algorithm
1 1 1 1 1
11
2111
1 222222
111111
3
1
1
1
44443222
3333
43332
5
55
43332 6
5
4
3
2 2
666
5 5
4
3
0 0 0 0 0 0 0 0 0 0 0 0
A G A C T G A G G T A
0
0
0
0
0
0
A
C
T
G
A
G
Communication between PE’s In 2D mesh network,
Communication between P.E’s themselves take place in two different ways
By using the nearest neighbors mesh interconnection network
Powerful variation on the nearest-neighbor mesh called the “Coterie network”, developed in response to the requirement for nonlocal communication
Properties significantly different from the usual mesh
Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie
Network Exact match Approximate match
Summary and Future work
Coteries[ Weems & Herbordt ]“A small often selected group of persons who
associate with one another frequently” Features:
Related to other Reconfigurable broadcast network Describable using hypergraphs Dynamic in nature
Advantages: Propagation of information quickly over long
distances at electrical speed Support of one-to-many communication within
coterie, reconfigurability of the coterie
PE’s form Coteries
5 x 5 coterie network with switches shown in “arbitrary” settings. Shaded areas denotes coterie (the set of PEs Sharing same circuit)
Coterie’s Physical Structure
In the Physical implementation, each PE controls set of switches Four of these switches control
access in the different directions (N,S,E,W)
Two switches H and V are used to emulated horizontal and vertical buses
The last two switches NE and NW are used to creation of eight way connected region
NWNE
WSES
V
H E
S
W
: Switch
N
Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie
Network Exact match Approximate match
Summary and Future work
LCS Algorithm on Coterie Network
A G A C T G A G G T A
LCS Algorithm on Coterie Network
A G A C T G A G G T A
LCS Algorithm on Coterie Network
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
A G A C T G A G G T A
Content of each PE’s after MULTICAST operation
LCS Algorithm on Coterie Network
A
C
T
G
A
G
LCS Algorithm on Coterie Network
A
C
T
G
A
G
LCS Algorithm on Coterie Network
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
A
C
T
G
A
G
Content of each PE’s after MULTICAST operation
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
0
0
1
11010001
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
0
0
A G A C T G A G G T A
A
C
T
G
A
G
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
0
0
1
11010001
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
0
0
A G A C T G A G G T A
A
C
T
G
A
G
Inject unique token
LCS Algorithm on Coterie Network
We try to refine the algorithm to support approximate matching
We make use of tokens The next example demonstrate this problem
For the string: Text :AGACTGAGGTA Pattern : ACTAAG
Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie
Network Exact match Approximate match
Summary and Future work
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
1
0
1
00100010
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
1
0
A G A C T G A G G T A
A
C
T
A
A
G
Inject unique token
Token method
In this method, we explicitly close the W-S switch based on some condition
We inject unique token symbols as shown in the next slide
Where this two symbol intersect within a PE’s, we close the W-S switch as shown,
Thus we get a path from first row to the last row as shown
LCS Algorithm on Coterie Network
1 0 1 0 0
00
0000
0 000001
100010
1
1
0
1
00100010
0000
00010
0
01
10001 1
0
0
1
0 0
001
0 1
1
0
A G A C T G A G G T A
A
C
T
A
A
G
Inject unique token
Presentation Outline Introduction to String matching and its variations Role of LCS in Molecular Biology Overview of LCS Brief introduction on Coterie Network Longest Common Subsequence on Coterie
Network Exact match Approximate match
Summary and Future work
Summary and Future work
We have presented two variation of the lcs algorithm
We have Explored a new network for this problem
Constant time algorithm for Exact matchApproximate algorithm depends upon the
diameter of the network
Summary and Future work
Future Work: Optimize the algorithm for Approximate match Implementing the algorithm on FPGA’s model Incorporating the Don’t Care Symbol Extend the idea to support sequence alignment Conserve memory by using encoding scheme We can use Virtual simulation of PEs, in case we
ran out of PEs
Acknowledgements
Professor Walker Professor Baker Professor Weems Professor Herbordt Professor Piontkivska Committee members for their time Kevin Schaffer, Hong Wang, Shannon Steinfadt, Jalpesh
Chitalia, and Michael Scherger
THANK YOU
Questions….