a simplified view of dcj-indel distance phillip compeau a simplified view of dcj- indel distance...
TRANSCRIPT
1
A Simplified View of DCJ-Indel Distance Phillip Compeau
A Simplified View of DCJ-Indel Distance
Phillip CompeauUniversity of California-San DiegoDepartment of Mathematics
A Simplified View of DCJ-Indel Distance Phillip Compeau
2
Abstract
• Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time.
• Goals:
1. “Hardwire” DCJ sorting into DCJ-indel sorting.
2. Characterize solution space for DCJ-indel sorting.• DCJ solution space known (Braga and Stoye, 2010).
3
A Simplified View of DCJ-Indel Distance Phillip Compeau
Section 1: Preliminaries
1. Preliminaries
2. Encoding Indels as DCJs
3. DCJ-Indel Sorting
4. The Solution Space of DCJ-Indel Sorting
5. Conclusion
A Simplified View of DCJ-Indel Distance Phillip Compeau
4
The Discrete Genome
• Genome (Π): formed of two matchings• genes g(Π): each numbered gene has a head and a tail.• adjacencies (a(Π)): a blue matching on V(g(Π))
Γ
Π
A Simplified View of DCJ-Indel Distance Phillip Compeau
5
The Discrete Genome
• Chromosome: component of Π (alternating path or cycle)• Linear or circular depending on path or cycle of Π
• Telomere: path endpoint of Π; has null adjacency {v, Ø}
Γ
Π
A Simplified View of DCJ-Indel Distance Phillip Compeau
6
• Double-cut-and-join operation (DCJ; Yancopoulos et al., 2005): “cuts” genome in two places and rejoins adjacencies.
• DCJ Distance (dDCJ(Π, Γ)): minimum # of DCJs required to transform Π into Γ (having the same genes).
The Double-Cut-and-Join Operation
A Simplified View of DCJ-Indel Distance Phillip Compeau
8
The Breakpoint Graph
• B(Π, Γ) is formed from the adjacencies of Π and Γ.
• B(Π, Γ) also comprises (alternating) red-blue paths and cycles.
A Simplified View of DCJ-Indel Distance Phillip Compeau
9
DCJ Distance Formula
• Bergeron et al., 2006: If Π and Γ share the same genes, then the DCJ distance is given by the following formula:
• N = # of genes• c(Π, Γ) = # of cycles in B(Π, Γ)• peven(Π, Γ) = # of even paths in B(Π, Γ)
A Simplified View of DCJ-Indel Distance Phillip Compeau
10
Indels and the DCJ-Indel Distance
• Indel: The insertion or deletion of a chromosome or chromosomal interval (consecutive genes).• Assumption: we can’t remove a gene common to Π and Γ
• DCJ-Indel Distance (dindDCJ(Π, Γ)): Minimum # of DCJs and
indels required to transform Π into Γ.• Braga et al., 2010: Solve DCJ-indel sorting in linear time.• Lots of cases…can we simplify it?
a bØ Øa b dc a b cØab c
d
11
A Simplified View of DCJ-Indel Distance Phillip Compeau
Section 2: Encoding Indels as DCJs
1. Preliminaries
2. Encoding Indels as DCJs
3. DCJ-Indel Sorting
4. The Solution Space of DCJ-Indel Sorting
5. Conclusion
A Simplified View of DCJ-Indel Distance Phillip Compeau
12
• Ma et al., 2009: View deletion as formation and removal of a circular chromosome.
• Idea: Indel = DCJ creating circular chromosome• Wait…what about the deletion of circular chromosomes?
Deletion DCJ Creating Circular Chromosome
a bØ Øa b dc a b cØ ab c
d
a d
b c a b
cØ
a bb ca d
DCJ DCJ DCJ
DCJ
A Simplified View of DCJ-Indel Distance Phillip Compeau
13
Apparent Exceptions
• Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ.
ab c
d
b ca d
DCJ
3 Operations
A Simplified View of DCJ-Indel Distance Phillip Compeau
14
Apparent Exceptions
• Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ.
ab c
d
1 Operation
ab c
d
b ca d
DCJ
3 Operations
A Simplified View of DCJ-Indel Distance Phillip Compeau
15
Apparent Exceptions
• Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ
• Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ.
• Question: Can we delete all circular singletons first?
A Simplified View of DCJ-Indel Distance Phillip Compeau
16
Apparent Exceptions
• Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ
• Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ.
• Question: Can we delete all circular singletons first? YES!
A Simplified View of DCJ-Indel Distance Phillip Compeau
17
Handling Circular Singletons
• Proposition: When transforming Π into Γ via a minimum collection of DCJs and indels, no gene belonging to a circular singleton of Π can ever appear in the same chromosome as a gene of Γ.
• Corollary 1: If Π* is formed from Π by removing a circular singleton from Π, then dind
DCJ(Π*, Γ) = dindDCJ(Π, Γ) – 1.
• Let sing(Π, Γ) = # of circular singletons of Π and Γ.
• Corollary 2: If Π0 and Γ0 are formed by removing all circular singletons from Π and Γ, then dind
DCJ(Π, Γ) = dindDCJ(Π0 , Γ0) + sing(Π, Γ)
A Simplified View of DCJ-Indel Distance Phillip Compeau
18
A Novel View of DCJ-Indel Distance
• WLOG we may henceforth assume that sing(Π, Γ) = 0.
• A completion of Π is a genome Π’ such that:• g(Π’) = g(Π) U g(Γ)• a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π)
• New chromosomes of Π’ are circular: the indels of Π’
• Theorem:
A Simplified View of DCJ-Indel Distance Phillip Compeau
19
A Novel View of DCJ-Indel Distance
• An optimal completion achieves the optimum below.
• A completion of Π is a genome Π’ such that:• g(Π’) = g(Π) U g(Γ)• a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π)
• New chromosomes of Π’ are circular: the indels of Π’
• Theorem:
20
A Simplified View of DCJ-Indel Distance Phillip Compeau
Section 3: DCJ-Indel Sorting
1. Preliminaries
2. Encoding Indels as DCJs
3. DCJ-Indel Sorting
4. The Solution Space of DCJ-Indel Sorting
5. Conclusion
A Simplified View of DCJ-Indel Distance Phillip Compeau
21
Open Vertices
• π-open vertex: vertex not found in Π (must be matched in Π’)• path endpoint in B(Π, Γ) must be π-open/γ-open or
telomere (or both)• Define {π, π}-paths, {π, γ}-paths, π-paths in B(Π, Γ)
• Idea: Construct B(Π*, Γ*) from B(Π, Γ) by matching vertices.
A Simplified View of DCJ-Indel Distance Phillip Compeau
22
Necessary Conditions for B(Π*, Γ*)
• Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).
A Simplified View of DCJ-Indel Distance Phillip Compeau
23
Necessary Conditions for B(Π*, Γ*)
• Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).• Picture:
ππ
π π
ππ
π π
Cycle
B(Π’, Γ’) B(Π’’, Γ’)
dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’)Vs.
A Simplified View of DCJ-Indel Distance Phillip Compeau
24
2-Bracelet
Necessary Conditions for B(Π*, Γ*)
• Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*).
• Remaining components of B(Π*, Γ*):• bracelet: cycle linking {π, γ}-paths• chain: path linking π-paths/γ-paths via intermediate {π, γ}-
pathsππ
γ γ
π π γ γπ
π ππ
3-Chain
2-Chain
A Simplified View of DCJ-Indel Distance Phillip Compeau
25
• Lemma 2: B(Π*, Γ*) can contain only 2-bracelets, 2-chains, and 3-chains.
• Picture:
Necessary Conditions for B(Π*, Γ*)
ππ
π π
γ γ
P1 P2
ππ
π π
γ γ
P1 P2Cycle
B(Π’, Γ’) B(Π’’, Γ’)
dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’)Vs.
A Simplified View of DCJ-Indel Distance Phillip Compeau
26
Necessary Conditions for B(Π*, Γ*)
• Lemma 3: B(Π*, Γ*) cannot have one 2-chain joining two odd π-paths and another 2-chain joining two even π-paths. The same holds for γ-paths.• Picture:
ππ
π π
P1
odd
P2
odd
P3
even
P4
even
B(Π’, Γ’)
ππ
π π
EvenPath
EvenPath
B(Π’’, Γ’)
dDCJ(Π’’, Γ’) < dDCJ(Π’, Γ’)
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Ø
Vs.
A Simplified View of DCJ-Indel Distance Phillip Compeau
27
Sorting Algorithm
1. Remove all circular singletons of Π and Γ.
2. Lemma 1 Close every {π, π}-path ({γ, γ}-path) into a cycle by adding a single new adjacency to Π* (Γ*).
3. Form a maximum set of 2-bracelets (only chains remaining).
4. Form a maximum set of even 2-chains by linking pairs of π-paths (γ-paths) having opposite parity (Lemma 3).
5. If pπ, γ is odd, then link the remaining {π, γ}-path with any remaining π-path and γ-path.
6. Arbitrarily link pairs of remaining π-paths, all of which have the same parity. Do the same for any γ-paths remaining.
A Simplified View of DCJ-Indel Distance Phillip Compeau
28
• Theorem: The preceding algorithm solves DCJ-indel sorting in linear time, and it implies a DCJ-indel distance formula:
where δ = 1 only if pπ, γ is odd and either:
1. pπodd > pπ
even , pγodd > pγ
even ; or
2. pπodd < pπ
even , pγodd < pγ
even
Otherwise, δ = 0.
DCJ-Indel Distance
ind
29
A Simplified View of DCJ-Indel Distance Phillip Compeau
Section 4: The Solution Space of DCJ-Indel
Sorting1. Preliminaries
2. Encoding Indels as DCJs
3. DCJ-Indel Sorting
4. The Solution Space of DCJ-Indel Sorting
5. Conclusion
A Simplified View of DCJ-Indel Distance Phillip Compeau
30
Encompassing all Possible Cases
• The solution space is known for DCJ-sorting (Braga and Stoye, 2010).
• Thus, we only need to find all optimal completions, and the specific operations will fall out in the wash.
A Simplified View of DCJ-Indel Distance Phillip Compeau
31
Handling Circular Singletons
• The circular singletons of Π must be removed in sing(Π) steps. We have two options:
1. Delete all the circular singletons of Π.
2. Perform k “fusion” DCJs followed by sing(Π) – k chromosome deletions.
• This poses a straightforward (yet tedious) counting problem.
A Simplified View of DCJ-Indel Distance Phillip Compeau
32
Adding Necessary Conditions on B(Π*, Γ*)
• Proposition 1: Every π-path embedding into a 3-chain of an optimal completion must have the same parity.
• Proposition 2: If pπ, y is even, then B(Π*, Γ*) must contain a maximum collection of even 2-chains.
• Proofs are slightly more involved…
A Simplified View of DCJ-Indel Distance Phillip Compeau
33
Finishing the Job
• Four cases, depending on path statistics.
1. pπ, γ is odd:
a) pπodd > pπ
even , pγodd > pγ
even (or vice-versa); δ = 1
b) pπodd > pπ
even , pγodd < pγ
even (or vice-versa); δ = 0
2. pπ, γ is even:
a) pπodd > pπ
even , pγodd > pγ
even (or vice-versa); δ = 0
b) pπodd > pπ
even , pγodd < pγ
even (or vice-versa); δ = 0
• These cases are tedious but straightforward and can be handled similarly.
34
A Simplified View of DCJ-Indel Distance Phillip Compeau
Section 5: Conclusion
1. Preliminaries
2. Encoding Indels as DCJs
3. DCJ-Indel Sorting
4. The Solution Space of DCJ-Indel Sorting
5. Conclusion
A Simplified View of DCJ-Indel Distance Phillip Compeau
35
Future Work
• Correspondence with Braga et al., 2010?
• Varying the indel cost?• Charge indel cost ≤ DCJ cost, take minimum total cost.• Most of the simplifying sorting lemmas hold, but actually
computing the minimum cost appears difficult in this model.
• The problem is solved! (under framework of Braga et al., 2010)
A Simplified View of DCJ-Indel Distance Phillip Compeau
37
Shameless Plug
• www.rosalind.info
• A novel education website that teaches bioinformatics through programming exercises.
• Have “professor” environment for assigning programming exercises to your bioinformatics classes.