inferring phylogeny using permutation patterns on genomic data
Post on 15-Jan-2016
20 Views
Preview:
DESCRIPTION
TRANSCRIPT
Inferring Phylogeny using Permutation Patterns on Genomic Data1Md Enamul Karim2Laxmi Parida1Arun Lakhotia
1University of Louisiana at Lafayette2IBM T. J. Watson Research Center
Phylogeny
Reconstruction of the evolutionary relationship of a collection of organisms, usually in the form of a tree.
Phylogenetic data Behavioral, morphological,
metabolic, etc. Molecular data: sequence data,
gene-order data etc.gene-order data
Why gene order data?
Low error rate. Rare evolutionary events unlikely
to cause “silent" changes; can help inferring millions of years.
Genomes rearrangements
• Inverted Transposition
1 2 3 9 -8 –7 –6 –5 –4 10
• Inversion
1 2 3 –8 –7 –6 –5 -4 9 10
• Transposition
1 2 3 9 4 5 6 7 8 10
1 2 3 4 5 6 7 8 9 10
Breakpoint distance
Breakpoints are number of adjacencies present in one genome, but not in the other.
1 2 3 4 5 6 7 8 9 10
1 –3 –2 4 5 9 6 7 8 10
For some datasets, a close-to-linear relationship between the breakpoints and evolutionary events may exist.
Can be used for building phylogeny (Blanchette et al.).
Limitations of breakpoint The number of breakpoints created by a
certain number of inversions may vary. Also, transpositions generally create more
breakpoints than inversions. Computing the breakpoint phylogeny is
NP-hard.
MPBE (Maximum Parsimony on Binary Encoding)
A heuristic for the breakpoint phylogeny
(Cosner et al.). All ordered pairs of signed genes
appearing consecutively are coded as binary features.
Exponential time complexity, however, much faster than BPAnalysis.
Limitations
May fail to find feasible solutions to the breakpoint phylogeny problem.
Observation: The closer is the evolution history, the more permutations (of different granularity) are in common
1 2 3 4 5 6 7 8 9 10
1 2 3 –8 –7 –6 –5 –4 9 10
1 8 –3 –2 –7 –6 –5 –4 9 10
Maximal pi-pattern (Eres et al.)
Matches permutations at different granularity.
Polynomial time complexity.
pi-pattern
Example :
For S = and k=2
All pi-patterns are: ac, bc, abc, abcc
acbcabacbcab
abc
Pattern with minimum k permutations
Cover
P1 covers P2=> Every P1 has a P2 Every P2 is within a P1
Example In S = acbcababc covers ac
Maximal pi-pattern
pi-pattern which is not covered
Example In S = acbcabpi-patterns: ac, bc, abc, abcc
Maximal pi-patterns: abc, abcc
not covered by abcc
Results
Phylogeny for simulated evolution on synthetic data
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
12 genera of Campanulaceaeand the outgroup tobacco
Tree1: MPBE tree
Tree2: Neighbor joining tree (using few different distances)
Tra
Sym
Cam
Ade
Wah
Mer
Leg
Asy
Tri
Cod
Cya
Pla
Tob
Tree3: Neighbor joining tree using permutation patterns
Tra
Sym
Cam
Ade
Wah
Mer
Asy
Leg
Tri
Cod
Cya
Pla
Tob
167 Maximal pi-patterns(from 10769 pi-patterns) used as binary feature
XOR Distance measure
Distance/Similarity matrix is created to find neighbor joining tree
Tree3 vs Tree2
Conclusion Permutation patterns may preserve more
evolutionary information. Evolutionary events could be counted
within permuted segments to develop a hybrid
scheme. Current approaches remain unable to
handle unequal gene content, which could be solved using maximal pi-patterns.
top related