genome rearrangements

62
Genome Rearrangements

Upload: randall-randall

Post on 01-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Genome Rearrangements. Basic Biology: DNA. Genetic information is stored in deoxyribonucleic acid (DNA) molecules. A single DNA molecule is a sequence of nucleotides adenine ( A ) cytosine ( C ) guanine ( G ) thymine ( T ). phosphate. nitrogenous base. pentose sugar. Nucleotide. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genome Rearrangements

Genome Rearrangements

Page 2: Genome Rearrangements

Basic Biology: DNA

• Genetic information is stored in deoxyribonucleic acid (DNA) molecules.

• A single DNA molecule is a sequence of nucleotides– adenine (A)– cytosine (C)– guanine (G)– thymine (T)

nitrogenousbase

pentosesugar

phosphate

Nucleotide DNA molecule

Page 3: Genome Rearrangements

Basic Biology: DNA

• Paired DNA strands are in reverse complementary orientation.– One in forward, 5’ to 3’ direction– The other in reverse, 3’ to 5’

direction

• Both strands are complementary.– A pairs with a T– G pairs with a C forward

strandreversestrand

5’

3’

3’

5’

Image modified with the permission of the National Human Genome Research Institute (NHGRI), artist Darryl Leja.

Page 4: Genome Rearrangements

Basic Biology: Genome

• The genome is the entire hereditary information of an organism.

• Genomes are partitioned into chromosomes.

• A chromosome can be linear (eukaryotes), or circular (prokaryotes). Image modified with the permission of the

National Human Genome Research Institute (NHGRI), artist Darryl Leja.

Page 5: Genome Rearrangements

The Human Karyogram

Karyotype of a human male.

Courtesy: National Human Genome Research Institute

Page 6: Genome Rearrangements

Changes in Genomic Sequences

• Genomes of different species (even of closely related individuals) differ from one another.

• These differences are caused by – point mutations, in which only one nucleotide is

changed, and– genome rearrangements, where multiple

nucleotides are modified.

Page 7: Genome Rearrangements

Point Mutations

• Insertion …ATGGCG… → …ATGTGCG…• Deletion …ATGTGCG…→ …ATGGCG…• Substitution …ATGTGCG… → …ATGCGCG…

…ATG-GCATGTGCGATGTGCG……ATGTGCATG-GCGATGCGCG…

DNA sequence alignment showing matches, mismatches, and insertions/deletions

Page 8: Genome Rearrangements

Genome Rearrangements

• Reversal

• Translocation

• Fission

• Fusion

1 2 3 4 5 6 7 8 9 1 2 3 6 5 4 7 8 9

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15

1 2 3 4 13 14 15

10 11 12 5 6 7 8 9

1 2 3 4 5 6 7 8 9 1 2 3 4

5 6 7 8 9

1 2 3 4

5 6 7 8 9

1 2 3 4 5 6 7 8 9

Page 9: Genome Rearrangements

Levenshtein’s Edit Distance

• Let A and B be two sequences (genomes). The minimum number of edit operations that transforms A into B defines the edit distance, dedit, between A and B.

• Possible edit operations:– point mutations– genome rearrangements

Page 10: Genome Rearrangements

A Word Puzzle• To transform a start word into a target word,

change, add, or delete characters until the target is reached.

• Example: start “spices” target “lice”:• spices → slices → slice → lice• spices → spice→ slice→ lice

• How many steps do you need to transform – a republican into a democrat?– Google into Yahoo?

Page 11: Genome Rearrangements

Edit Distance Using Point Mutations

S1=AGCTT, S2=AGCCTG, S3=ACAG

AGCTT AGCTG AGCCTG dedit(S1,S2) = 2

AGCTT AGCTG AGCAG ACAG dedit(S1,S3) = 2

AGCCTG AGCTG AGCAG ACAG dedit(S2,S3) = 2

TG insert C

TG TA delete G

delete C TA delete G

Page 12: Genome Rearrangements

Edit Distance and Evolution• The edit distance is often used to infer evolutionary relationships.• Parsimony assumption: the minimum number of changes reflects the true

evolutionary distance

Parsimonious phylogeny inferred from edit distances

Page 13: Genome Rearrangements

Levenshtein’s Edit Distance

• Let A and B be two sequences (genomes). The minimum number of edit operations that transforms A into B defines the edit distance, dedit, between A and B.

• Possible edit operations:– point mutations– genome rearrangements

Page 14: Genome Rearrangements

Rearrangements and Anagrams

• An anagram is a rearrangement of a word or phrase into another word or phrase.

• eleven plus two → twelve plus one• forty five → over fifty

Please visit the Internet Anagram web server athttp://wordsmith.org/anagram/.

Page 15: Genome Rearrangements

Rearrangements and Anagrams

Dot plot: “spendit” vs. “stipend” Dot plot: Mouse genome vs. Human genome

Page 16: Genome Rearrangements

Genome Comparison: Human - Mouse

• Humans and mice have similar genomes, but their genes are in a different order.

• How many edits (rearrangements) are needed to transform human into mouse?

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 17: Genome Rearrangements

Transforming Mice into Humans

a) Mouse and human share a common ancestor

b) They share the same genes, but in a different order

c) A series of rearrangements transforms one genome into the other

Page 18: Genome Rearrangements

History of Chromosome X

Rat Consortium, Nature, 2004

Page 19: Genome Rearrangements

Dobzhansky’s Experiment

Drosophila melanogaster life cycletaken from FlyMove

Giant polytene chromosomesModified from T.S. Painter, J. Hered. 25:465–476, 1934.

Harvesting polytene chromosomestaken from BioPix4U

Page 20: Genome Rearrangements

Dobzhansky’s Experiment

Standard and Arrowhead arrangements differ by an inversion from segments 70 to 76

Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

Chromosome 3 of Drosophila pseudoobscura

Page 21: Genome Rearrangements

Dobzhansky’s Experiment

Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

Configurations observed in various inversion heterozygotes

Page 22: Genome Rearrangements

Dobzhansky’s Experiment

Figures taken from Dobzhansky T, Sturtevant AH. Genetics (1938), 23(1):28-64.

Single and Double Inversions Phylogeny for 3rd chromosome of D. pseudoobscura

Page 23: Genome Rearrangements

Unsigned Reversals1 32

4

10

56

8

9

7

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 24: Genome Rearrangements

Unsigned Reversals1 32

4

10

56

8

9

7

1, 2, 3, 8, 7, 6, 5, 4, 9, 10

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 25: Genome Rearrangements

Unsigned Reversals and Gene Orders

= 5 1 4 3 2 6 7 8 9 10

r(1,2)

= 1 5 4 3 2 6 7 8 9 10

r(2,5)

= 1 2 3 4 5 6 7 8 9 10

Page 26: Genome Rearrangements

Reversal Edit Distance• Goal: Given two permutations, find the shortest series

of reversals that transforms one into another

• Input: Permutations and

• Output: A series of reversals r1,…,rt transforming into such that t is minimum

• t - reversal distance between and • drev(, ) - smallest possible value of t, given and

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 27: Genome Rearrangements

Sorting by Reversals Problem

• Goal: Given a permutation, find a shortest series of reversals that transforms it into the identity permutation (1 2 … n )

• Input: Permutation π

• Output: A series of reversals r1, …, rt transforming π into the identity permutation such that t is minimum

• Reversal Distance Problem and Sorting by Reversals Problem are equivalent. Why?

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 28: Genome Rearrangements

Algorithm 1: GreedyReversalSort(π)

1 for i 1 to n – 12 j position of element i in π (i.e. π[j]=i)3 if j≠i 4 π π • r(i, j) 5 output π

6 if π is the identity permutation 7 return

Taken from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 29: Genome Rearrangements

GreedyReversalSort is Not Optimal• For = 6 1 2 3 4 5 the algorithm needs 5 steps:

• Step 0: 6 1 2 3 4 5• Step 1: 1 6 2 3 4 5 i=1; j=2; r(1,2)• Step 2: 1 2 6 3 4 5 i=2; j=3; r(2,3)• Step 3: 1 2 3 6 4 5 i=3; j=4; r(3,4)• Step 4: 1 2 3 4 6 5 i=4; j=5; r(4,5)• Step 5: 1 2 3 4 5 6 i=5; j=6; r(5,6)

• However, two reversals are enough:• Step 0: 6 1 2 3 4 5• Step 1: 6 5 4 3 2 1• Step 2: 1 2 3 4 5 6

Page 30: Genome Rearrangements

Adjacencies & Breakpoints• An adjacency is a pair of adjacent elements that are consecutive

• A breakpoint is a pair of adjacent elements that are not consecutive

• b()is the number of breakpoints in

π = 5 6 2 1 3 4

0 5 6 2 1 3 4 7

adjacencies

breakpoints, b()=4

Extend π with π0 = 0 and π7 = 7

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 31: Genome Rearrangements

One reversal eliminates at most 2 breakpoints.

= 0 2 3 1 4 6 5 7 b() = 5

= 0 1 3 2 4 6 5 7 b() = 4

= 0 1 2 3 4 6 5 7 b() = 2

= 0 1 2 3 4 5 6 7 b() = 0

This implies: reversal distance ≥ b() / 2

Reversal Distance and Breakpoints

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 32: Genome Rearrangements

Strips• An interval between two consecutive breakpoints in

a permutation is called a strip.– A strip is increasing if its elements increase.– Otherwise, the strip is decreasing.

0 1 5 6 7 4 3 2 8 9 10

– A single-element strip is considered decreasing with exception of the strips [0] and [n+1].

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 33: Genome Rearrangements

Strips and Breakpoints

Observation 1: If a permutation contains a decreasing strip, then there exists a reversal that will decrease the number of breakpoints.

0 1 5 6 7 4 3 2 8 9 10 0 1 2 3 4 7 6 5 8 9 10

Observation 2: Otherwise, create a decreasing strip by reversing an increasing strip. The number of breakpoints can be reduced in the next step.

0 1 5 6 7 2 3 4 8 9 10 0 1 5 6 7 4 3 2 8 9 10

r(3,8)

r(6,8)

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 34: Genome Rearrangements

Algorithm2: BreakpointReversalSort(π)

1 while b(π) > 02 if π has a decreasing strip Choose reversal r that minimizes b(π • r)4 else5 Choose a reversal r that flips an increasing

strip in π 6 π π • r7 output π 8 return

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 35: Genome Rearrangements

• BreakpointReversalSort (BRS) is an approximation algorithm that will not use more than four times the minimum number of reversals.

– BRS eliminates at least one breakpoint every two steps: dBRS ≤ 2b(p) steps

– An optimal algorithm eliminates at most two breakpoints every step: dOPT b(p) / 2 steps

Performance guarantee:dBRS / dOPT [ 2b(p) / (b(p)/2) ] = 4

Performance Guarantee

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 36: Genome Rearrangements

Gene Orientation & Genome Representation

modified from http://acim.uqam.ca/~anne/INF4500/Rearrangements.ppt

Page 37: Genome Rearrangements

Genome Rearrangements

Page 38: Genome Rearrangements

Signed Reversals

5’ ATGCCTGTACTA 3’

3’ TACGGACATGAT 5’

5’ ATGTACAGGCTA 3’

3’ TACATGTCCGAT 5’

Break and Invert

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 39: Genome Rearrangements

Signed Reversals1 32

4

10

56

8

9

7

1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 40: Genome Rearrangements

Signed Reversals1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 41: Genome Rearrangements

Signed Reversals and Breakpoints1 32

4

10

56

8

9

7

1, 2, 3, -8, -7, -6, -5, -4, 9, 10

The reversal introduced two breakpoints

Taken and modified from An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner

Page 42: Genome Rearrangements

Summary: Complexity Results

• Sorting by unsigned reversals:– NP-hard– can be approximated within a constant factor

• Sorting by signed reversals:– can be solved in polynomial time

Page 43: Genome Rearrangements

Web Tools

• GRIMM Web Server– computes signed and unsigned reversal distances

between permutations.http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM

• Cinteny– a web server for synteny identification and the

analysis of genome rearrangementhttp://cinteny.cchmc.org/

Page 44: Genome Rearrangements

DCJ Genome Rearrangements

• The DCJ model uses Double-Cut-and-Join genome rearrangement operations.

• DCJ operations break and rejoin one or two intergenic regions (possibly on different chromosomes).

Page 45: Genome Rearrangements

Genome Representation• In the DCJ model, a genome is

grouped into chromosomes (linear/circular).

• A gene g on the forward strand is represented by [-g,+g]

• A gene g on the reverse strand is represented by [+g,-g]

• Telomeres are represented by the special symbol ‘o’.

• An adjacency (intergenic region) is encoded by the unordered pair of neighboring gene/telomere ends.

Example. • linear c1=(o 1 -2 3 4 o)• circular c2=(5 6 7)

Page 46: Genome Rearrangements

DCJ Operations

• The double-cut-and-join operation “breaks” two adjacencies and rejoins the fragments:{a, b} {c, d} → {a,d} {c,b}, or {a,c} {b,d}.

• a, b, c, and d represent different (signed) gene ends or telomeres (with ‘+o’ = ‘-o’).

• A special case occurs for c=d=o:{a,b} {o,o} ↔ {a,o} {b,o}.

Page 47: Genome Rearrangements

Signed reversal of genes 2 and 3

Page 48: Genome Rearrangements

Chromosome Linearization

Page 49: Genome Rearrangements

Weird genme transformation

Page 50: Genome Rearrangements

Using Graphs to Sort Genomes• Adjacency graph AG(A,B)=(V,E)

is a bipartite graph. • V contains one vertex for each

adjacency of genome A and B.• Each gene, g, defines two

edges:• e1 connecting the adjacencies

with +g of A and B • e2 connecting the adjacencies

with –g.

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)

Example:

Page 51: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)

Example:

Page 52: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)

Example:

Page 53: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)

Example:

Page 54: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o) DCJ1: {1,2} {-2,-3} {1,-2} {2,-3}

Example:

Page 55: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)DCJ1: {1,2} {-2,-3} {1,-2} {2,-3}

Example:

Page 56: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)DCJ1: {1,2} {-2,-3} {1,-2} {2,-3}DCJ2: {4,o} {7,-5} {4,-5} {7,o}

Example:

Page 57: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)DCJ1: {1,2} {-2,-3} {1,-2} {2,-3}DCJ2: {4,o} {7,-5} {4,-5} {7,o}

Example:

Page 58: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)DCJ1: {1,2} {-2,-3} {1,-2} {2,-3}DCJ2: {4,o} {7,-5} {4,-5} {7,o}DCJ3: {3,-4} {o,o} {3,o} {o,-4}

Example:

Page 59: Genome Rearrangements

Using Graphs to Sort GenomesAlgorithm 3: DCJSORT(A,B)1 Generate adjacency graph AG(A, B) of A and B2 for each adjacency {p, q} with p,q≠o in genome B do 3 let u={p,l} be the vertex of A that contains p4 let v={q,m} be the vertex of A that contains q5 if u ≠ v then6 replace vertices u and v in A by {p,q} and {l,m}7 update edge set8 end if9 end for10 for each telomere {p,o} in B do11 let u={p,l} be the vertex of A that contains p12 if l≠o then13 replace vertex u in A by {p,o} and {o,l} 14 update edge set 15 end if16 end for

genome A: (o 1 -2 3 4 o) (5 6 7)genome B: (o 1 2 3 4 o) (o 5 6 7 o)DCJ1: {1,2} {-2,-3} {1,-2} {2,-3}DCJ2: {4,o} {7,-5} {4,-5} {7,o}DCJ3: {3,-4} {o,o} {3,o} {o,-4}

Example:

A DCJ1 DCJ2 DCJ3B

Page 60: Genome Rearrangements

Summary: Complexity Results

• Sorting by unsigned reversals:– NP-hard– can be approximated within a constant factor

• Sorting by signed reversals:– can be solved in polynomial time

• Sorting by DCJ rearrangements:– can be solved in polynomial time

Page 61: Genome Rearrangements

The End

Page 62: Genome Rearrangements

Disclaimer

• Our presentation is in many parts inspired by the textbook An Introduction to Bioinformatics Algorithms by Neil Jones and Pavel Pevzner, by lectures from Anne Bergeron and Julia Mixtacki, as well as many review articles from multiple colleagues.