common intervals in sequences, trees, and graphs

35
Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li

Upload: tamekah-marshall

Post on 31-Dec-2015

33 views

Category:

Documents


0 download

DESCRIPTION

Common Intervals in Sequences, Trees, and Graphs. Steffen Heber and Jiangtian Li. Genome Comparison of Bacteria. Kim et al ., Nat. Biotechnol. , 2004]. Gene Order & Function in Bacteria. - PowerPoint PPT Presentation

TRANSCRIPT

Common Intervals in Sequences,Trees, and Graphs

Steffen Heber and Jiangtian Li

Genome Comparison of Bacteria

Kim et alKim et al.,., Nat. Biotechnol., 2004]

Gene Order & Function in Bacteria

• Gene order in bacteria is weakly conserved. [Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996]

• Some genes cluster together even in unrelated species.

• Genes inside a cluster are functionally associated.[Conserved clusters of functionally related genes in two bacterial

genomes. Tamames et al.; J Mol Evol. 1997]

Gene Order & Function in Bacteria

Gene Order & Function in Bacteria

Formalization of Gene Clusters

Genomes: permutations π1, π2 ,…, πk

Genes: numbers 1,…,n

π1

π2

π3

π4

1 2 3 4 5 6 7 8

8 7 6 4 5 2 1 3

3 1 2 5 8 7 6 4

6 7 4 2 1 3 8 5

Intervals

• For permutation of [n] = {1, 2, …, n},an interval (=gene cluster) is a set{(i), (i+1), …, (j)} for 1 i < j n.

• Any permutation of [n] has n(n-1)/2 intervals.

1 3 5 4 2 6 7

Common Intervals

• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.

• We say SCF .

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Common Intervals

• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.

• We say SCF .

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Common Intervals

• For a family F = (0, 1, …, k-1) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i.

• We say SCF .

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Lemma

Let F = (0, 1, …, k-1) and c, d CF .

• If c d then c d CF.

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

Lemma

Let F = (0, 1, …, k-1) and c, d CF .

• If c d then c d CF.

• We call c d reducible.

1 3 5 4 2 6 7 2 4 5 1 3 7 6

0 1

reducible interval

irreducible

Analysis

• We have K n(n-1)/2 common intervals, and I<n irreducible intervals.

• Find all K common intervals of k 2 permutations of [n]:O(kn + K) time & O(n) space

Common Intervals of Trees

Let T,T1,…,Tk be trees with vertex set [n].

Definition:

• S [n] is interval of T iffT[S] connected, and |S|>1

• S [n] is common interval of T1,…,Tk, iffS is interval in all trees.

• Tree intervals generalize intervals of permutations.

Miscellaneous

Example:

common intervals of T1, T2: { [2], [3], [4], [5] }

• (Common) Intervals in trees are induced subtrees.

4321

5

T1

5412

3

T2

Structure of Tree Intervals

• Tree intervals have the Helly property, i.e. for any family of tree intervals (Ti)iI the assumption Tp Tq for every p,qI implies iITi

Extreme Cases

n-vertex stars Sn-1

# non-trivial induced subtrees: 2n-1-1

The Common Interval Graph

• Given T = (T1,…,Tk ) and corresponding common intervals CT. The common interval graph GT = (V,E) is the graph with

V = CT

E = {(c,d) | c,d CF, cd , c d}

Example

• V=[n], T=(Pn, Sn-1)

• We have CT = { [2],[3],…,[n] },GT = K(CT).

[2]

[3]

[4]

[n]

1

2

3

4321

4

GT

Common Interval Graphs cont’d

A graph is called chordal, if it does not contain an induced cycle Cn on n>3 vertices.

Proposition: Common interval graphs of trees are chordal graphs.

Irreducible Common Intervals

For a common interval c CT and a subset V CT we say that V generates c, iff

i. for each d V, d c

ii. c = Ud

iii. GT[V] is connected.

If there is no such V then c is irreducible.

The irred. intervals generate all common intervals.

1

53

2 4

6 7

Finding Irreducible Intervals

• We have K < 2n-1 common intervals, and I<n irreducible intervals.

• Find all irreducible common intervals of k trees on n vertices:O(kn2) time & O(kn) space

Finding Irreducible Intervals

• Irreducible intervals are minimal common intervals containing an adjacent vertex pair.

yx

l

z

m

x y lz m

yx

l

z

m

x y lz m

Graph Intervals

G=(V,E), undirected, connected graph, V=[n]

S V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S.

1

32

4

1

32

4

convex NOT!

Common Intervals of Graphs

Let G=(G1,…,Gk) family of connected undirected graphs, with vertex set [n].

Definition: S [n] is common interval of G, iff S is interval in all graphs.

• Graph intervals generalize tree intervals.

1

32

4

2

34

1

G0 G1

Some Differences

• The union of convex sets is NOT always convex.

Some Differences

3

21

• The common convex hull of an adjacent vertex pair is NOT always irreducible.

3

21

G1 G2

Finding Irreducible Graph Intervals

Sketch: Given G=(G0, G1, …, Gk-1)

For each edge (i,j)Ei* do

S(i,j) := {i,j}

For each (k,l)S(i,j)

Add vertices ‘between’ k and l to S(i,j)

Remove reducible intervals

Extreme Cases

Permutations (identical permutations):

• C n(n-1)/2 I < n

Trees (identical star-trees):

• C < 2n-1 I < n

Graphs (complete graphs):

• C < 2n I n(n-1)/2

Example: InterDom

Database of protein domain interactions.• Gene fusions• Protein-protein interactions (DIP & BIND)• Protein complexes (PDB)

Comparing Two Networks

Comparing Three Networks

G : Gene fusionP : PDBB : BIND D : DIP

Irreducible Intervals

size of irreducible interval

Biological Meaningful?

RAS family domain protein kinase

ankyrin repeat

PH domain

regulator of chromosome condensation

THANK YU!!!