inferring phylogeny using permutation patterns on genomic data

Inferring Phylogeny using Permutation Patterns on Genomic Data1Md Enamul Karim2Laxmi Parida1Arun Lakhotia

1University of Louisiana at Lafayette2IBM T. J. Watson Research Center

Phylogeny

Reconstruction of the evolutionary relationship of a collection of organisms, usually in the form of a tree.

Phylogenetic data Behavioral, morphological,

metabolic, etc. Molecular data: sequence data,

gene-order data etc.gene-order data

Why gene order data?

Low error rate. Rare evolutionary events unlikely

to cause “silent" changes; can help inferring millions of years.

Genomes rearrangements

• Inverted Transposition

1 2 3 9 -8 –7 –6 –5 –4 10

• Inversion

1 2 3 –8 –7 –6 –5 -4 9 10

• Transposition

1 2 3 9 4 5 6 7 8 10

1 2 3 4 5 6 7 8 9 10

Breakpoint distance

Breakpoints are number of adjacencies present in one genome, but not in the other.

1 2 3 4 5 6 7 8 9 10

1 –3 –2 4 5 9 6 7 8 10

For some datasets, a close-to-linear relationship between the breakpoints and evolutionary events may exist.

Can be used for building phylogeny (Blanchette et al.).

Limitations of breakpoint The number of breakpoints created by a

certain number of inversions may vary. Also, transpositions generally create more

breakpoints than inversions. Computing the breakpoint phylogeny is

NP-hard.

MPBE (Maximum Parsimony on Binary Encoding)

A heuristic for the breakpoint phylogeny

(Cosner et al.). All ordered pairs of signed genes

appearing consecutively are coded as binary features.

Exponential time complexity, however, much faster than BPAnalysis.

Limitations

May fail to find feasible solutions to the breakpoint phylogeny problem.

Observation: The closer is the evolution history, the more permutations (of different granularity) are in common

1 2 3 4 5 6 7 8 9 10

1 2 3 –8 –7 –6 –5 –4 9 10

1 8 –3 –2 –7 –6 –5 –4 9 10

Maximal pi-pattern (Eres et al.)

Matches permutations at different granularity.

Polynomial time complexity.

pi-pattern

Example :

For S = and k=2

All pi-patterns are: ac, bc, abc, abcc

acbcabacbcab

Pattern with minimum k permutations

P1 covers P2=> Every P1 has a P2 Every P2 is within a P1

Example In S = acbcababc covers ac

Maximal pi-pattern

pi-pattern which is not covered

Example In S = acbcabpi-patterns: ac, bc, abc, abcc

Maximal pi-patterns: abc, abcc

not covered by abcc

Results

Phylogeny for simulated evolution on synthetic data

12 genera of Campanulaceaeand the outgroup tobacco

Tree1: MPBE tree

Tree2: Neighbor joining tree (using few different distances)

Tree3: Neighbor joining tree using permutation patterns

167 Maximal pi-patterns(from 10769 pi-patterns) used as binary feature

XOR Distance measure

Distance/Similarity matrix is created to find neighbor joining tree

Tree3 vs Tree2

Conclusion Permutation patterns may preserve more

evolutionary information. Evolutionary events could be counted

within permuted segments to develop a hybrid

scheme. Current approaches remain unable to

handle unequal gene content, which could be solved using maximal pi-patterns.

inferring phylogeny using permutation patterns on genomic data

Documents

inferring specifications

on the design of bit permutation based ciphers ·...

from algae to angiosperms inferring the phylogeny of green...

inferring phylogeny - national taiwan...

permutation groups - christian brothers...

permutation groups:

permutation groups

permutation circuits

permutation tests hal whitehead biol4062/5062. introduction...

inferring phylogeny using permutation patterns on genomic...

the utility of morphological characters for inferring...

inferring dispersal: a bayesian approach to phylogeny-based...

permutation groups - christian brothers...

permutation patterns

inferring meaning

algebra - permutation

how to generate a random permutation in a permutation...

phylip phylip (the phylogeny inference package) is a package...

chap. 6 inferring molecular phylogeny -...

permutation & combination