phylogeny reconstruction from experimental data dean l. zeller dr. f. f. dragan, advisor kent state...

42
Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th , 2006

Upload: simon-welch

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Phylogeny ReconstructionSlide 3 of 42 Outline Goals of research Evolution trees, phylogenies Assumptions Atlas of Evolution Trees Genetic tests (hypothetical) Phylogeny reconstruction methods Future Work

TRANSCRIPT

Page 1: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction from Experimental Data

Dean L. ZellerDr. F. F. Dragan, advisor

Kent State UniversityApril 7th, 2006

Page 2: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 2 of 42

“…the great Tree of Life fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications.”

Charles Darwin (1809-1882) Father of Evolution

Page 3: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 3 of 42

Outline

• Goals of research• Evolution trees, phylogenies• Assumptions• Atlas of Evolution Trees• Genetic tests (hypothetical)• Phylogeny reconstruction methods• Future Work

Page 4: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 4 of 42

Goals of ResearchSpecific Goals• Create methods of phylogeny reconstruction from

various hypothetical tests.• Make use of and create more adequate evolution

models.• Create “teachable” lessons on bioinformatics suitable

for a mid-level computer science, mathematics, or biology class.

Long Term Goals• Discover methods of phylogeny reconstruction from a

new perspective.• Educate the next generation of computational biologists.

Page 5: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 5 of 42

Evolution Tree example

Page 6: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 6 of 42

Evolution Tree example

Page 7: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 7 of 42

Evolution Tree (theoretical)

Page 8: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 8 of 42

Assumptions

• By making simple assumptions, the problem complexity is greatly reduced.

1. Redundant nodes removed2. Multiple splits nodes replaced with

isomorphic approximations3. Only consider isomorphically unique trees

Page 9: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 9 of 42

Assumption #1• Redundant nodes are

removed without loss of data.

• It is already assumed the species is slowly changing over time. It does not add to the problem to consider a single point along the way.

Page 10: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 10 of 42

Assumption #2

• Multiple split nodes replaced with isomorphic approximations

• Some loss of data, but greatly reduces the problem complexity

Page 11: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 11 of 42

Assumption #3• Isomorphically unique trees

Page 12: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 12 of 42

Atlas of Evolution Trees (5 leaves) (12)

a b

(122)

c

a b

(1222)

c d

a b

(124)

a b c d

(1224)

e

a b c d

(1242)

e

a b

c d

Page 13: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 13 of 42

Atlas of Evolution Trees (6 leaves) (122222)

c d

a b

e f

(12224)

a b

e f

c d

(12242)

a b

e

f

c d

(12422)

a b

e f

c

d

(1244a)

a b

e f

c d

(1244b)

a b

d c

e f

Page 14: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 14 of 42

Genetic Tests

• At this point, all tests are purely hypothetical.• Plausible results can be converted from existing

tests.• Binary Two-Species Test (BTST)• Discrete Two-Species Test (DTST)• Continuous Two-Species Test (CTST)• Closer Relative Three-Species Test (CRTST)

Page 15: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 15 of 42

Binary Two-Species Test (BTST)

• Returns 1 if species x and y are genetically close to a certain degree, and 0 otherwise.

• Data collected to form a similarity grid and distance graph (k-leaf root).

Page 16: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 16 of 42

Reconstruction from BTSTStep 1 – Difference Summary Table

  a b c d e f

a   1 1 0 0 0

b     1 0 0 0

c       1 0 0

d         1 1

e           1

f            

Step 2 – k-leaf root

Step 3 – phylogeny

Page 17: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 17 of 42

Reconstruction from BTST

• Linear time solution exists for k = 3 [Br05a]

• … and k = 4 [Br05b]

• An open problem for k 5– Severely limits analysis capability.

Page 18: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 18 of 42

Discrete Two-Species Test (DTST)

• Returns a discrete value (k=2,3,4,…) denoting distance between x and y in tree.

• Test can be converted from existing tests.• Data collected to form a distance grid.• Create distance graphs incrementally.

Page 19: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 19 of 42

Reconstruction from DTSTDifference Summary Table

  a b c d e f

a   2 3 5 6 6

b     3 5 6 6

c       4 5 5

d         3 3

e           2

f            

k 2 k 3

k 4 k 5

k 6

Page 20: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 20 of 42

Reconstruction from DTST

Distance 2Direct Neighbors

Distance 3Close relatives

Distance 4Tree complete

Page 21: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 21 of 42

Continuous Two Species Test (CTST)• Returns a continuous value d denoting

distance between x and y in tree.

• Data collected to form a distance grid.• Tree reconstructed in ascending order of

closeness.• Highest degree of accuracy required

Page 22: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 22 of 42

Reconstruction from CTSTDistance Summary Table

  a b c d e f

a   1.96 3.64 7.31 9.07 11.65

b     3.51 7.64 12.34 10.71

c       5.90 8.21 7.99

d         4.73 4.63

e           2.31

f            

Page 23: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 23 of 42

Reconstruction from CTSTDiff(a,b) 1.96 Make connection Diff(e,f) 2.31 Make connection Diff(b,c) 3.51 Make connection Diff(a,c) 3.64 Connection previously establishedDiff(d,f) 4.63 Make connection Diff(d,e) 4.73 Connection previously establishedDiff(c,d) 5.90 Make connection , STOP -- All species included in tree

Page 24: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 24 of 42

Actual CTST data

Bovine Mouse Gibbon Orang Gorilla Chimp HumanBovine 1.67 1.72 1.66 1.52 1.60 1.59Mouse 1.52 1.48 1.45 1.44 1.46Gibbon 0.71 0.60 0.62 0.56Orang 0.46 0.51 0.47Gorilla 0.35 0.31Chimp 0.27Human

Source: [Fe04]

Page 25: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 25 of 42

Phylogeny ReconstructionReconstruction from CTST results in the following tree:

Page 26: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 26 of 42

CTST Results, part 1• Use the correlation statistical measurement to determine

relationship between data used to create tree and distance data created by tree. (>0.8 is “strong”.)

data distancechimp human 0.27 2gorilla human 0.31 3gorilla chimp 0.35 3orang gorilla 0.46 3orang human 0.47 4orang chimp 0.51 4gorilla human 0.56 5gibbon gorilla 0.60 5gibbon chimp 0.62 5gibbon orang 0.71 3

Correlation: 0.64 (positive relationship exists)

Note: if gibbon orang was 5 instead of 3, the correlation would be 0.93.

Page 27: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 27 of 42

CTST Results, part 2• Use the correlation statistical measurement to determine

relationship between remaining data and distance data created by tree.

data distancemouse chimp 1.44 6mouse gorilla 1.45 5mouse human 1.46 6mouse orang 1.48 4mouse gibbon 1.52 3bovine gorilla 1.52 6bovine human 1.59 7bovine chimp 1.60 7bovine orang 1.66 5bovine mouse 1.67 3bovine gibbon 1.72 4

Correlation: -.24 (weak negative relationship)

Page 28: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 28 of 42

CTST Conclusions

• Relationship is statistically significant for the lower data values resulting in species close on resulting phylogeny, but is weak for data values further away.

• There are stronger methods of phylogeny reconstruction, but this serves as a good starting point.

Page 29: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 29 of 42

Closer Relative Three-Species Test (CRTST)

• Returns one of two possible trees on three species.

• Use the Merge Partial Evolution Trees [Li99] algorithm to reconstruct phylogeny.

• Allows for multiple species evolution.

Type 1 Single source

x y z

Type 2 Split source

x y z

Page 30: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 30 of 42

Results from CRTST data

T4

e g h

T2

d c a

T6

e h b

T1

a b d

T8

f g h

T5

e f h

T3

a d f

T7

e f c

Page 31: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 31 of 42

Reconstruction from CRTSTStep 1: Merge T1 with T2

In both trees, species a and d share a common ancestor. Species b shares an ancestor with a, and c shares an ancestor with d. LCA(a,d) is further up in the tree than LCA(a,b) and LCA(c,d).

T1

a b d

T2

d c a

+ = T12

a b c d

Step 2: Merge T12 with T3 In T3, species a and f share a common ancestor higher in the tree than a and d. LCA(a,f) is further up in the tree than LCA(a,d).

T3

a d f

+ = T12

a b c d

T123

a b c d f

Page 32: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 32 of 42

Reconstruction from CRTSTStep 3: Merge T123 with T4

There are no intersections between the leaves in T123 and T4. Thus there is no evidence indicating these species came from the same ancestor. At this point, they cannot be merged. T4 is set aside and will be visited again.

T123

a b c d f

T4

e g h

Step 4: Merge T123 with T5 In T5, species e and h share a common ancestor higher in the tree than e and f. LCA(e,h) is further up in the tree than LCA(e,f).

+ = T123

a b c d f

T5

e f h a b

T1235

c d e f h

Page 33: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 33 of 42

Reconstruction from CRTSTStep 5: Merge T1235 with T6

In T6, species e and b share a common ancestor higher in the tree than e and h. LCA(e,b) is further up in the tree than LCA(e,h). Since the connection of b, e, and h was already determined in the last step, there is no change to the tree.

+ = a b

T1235

c d e f h

T6

e h b a b

T12356

c d e f h

Step 6: Merge T12356 with T7 In T7, species e and c share a common ancestor higher in the tree than e and f. LCA(e,c) is further up in the tree than LCA(e,f). Since the connection of c, e, and f was already determined in the last step, there is no change to the tree.

+ = a b

T12356

c d e f h

T7

e f c a b

T123567

c d e f h

Page 34: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 34 of 42

Reconstruction from CRTSTStep 7: Merge T123567 with T8

Species f, g, and h all share the same common ancestor. Since the connection of f and h was already determined in the last step, g is added as a child to LCA(f,h).

+ = a b

T123567

c d e f h

T8

f g h a b

T1235678

c d e f g h

Step 8: Merge T1235678 with T4 Species e, g, and h all share the same common ancestor. Since the connection of e, g, and h has already been determined, there is no change to the tree.

+ = T4

e g h a b

T1235678

c d e f g h a b

T12345678

c d e f g h

Page 35: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 35 of 42

Literature Review of Related Methods

• Additive and Ultrametric Trees [Wu04]

• Minimum Increment Evolution Tree (MEIT) [Wu04]

• Evolutionary Tree Insertion with Minimum Increment (ETIMI) [Wu04]

• Maximum Homeomorphic Agreement Subtree (MHT) [Ga97]

• Maximum Agreement Subtree (MAST) [Ga97]

• Maximum Inferred Consensus Tree (MICT) [Li99] • Maximum Inferred Local Consensus Tree (MILCT) [Li99] • Balanced Randomized Tree Splitting (BRTS) [Ka99] • Merging Partial Evolution Trees (MPET) [Li99]

Page 36: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 36 of 42

Atlas of Distance Graphs• Inspired by An Atlas of Graphs [Re99]

• Elegant yet simple way to analyze graphs and trees

• Apply same style to phylogenies and distance graphs.

Page 37: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 37 of 42

Atlas of Distance Graphs (12)

a b

ab

a b

(122)

c

a b

ab c

a

b

abc c

a

b

(1222)

c d

a b

ab d

a b

c

abcd d

a b

c

k=2

k=2 k=3

abc, cd d

a b

c

k=2 k=3 k=4

Page 38: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 38 of 42

Atlas of Distance Graphs (124)

a b c d

abcd d

a b

c

ab

d

a

b

c

e

abcde d

a

b

c

e

k=2

ab, cd d

a b

c

ab, cd d

a b

c

k=3 k=4

abc, cd, ef d

a

b

c

e

abcd, cde d

a

b

c

e

k=2 k=3 k=4 k=5

Page 39: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 39 of 42

Distance Graph Simulator

ab

d

f

g

h c

i

eGraph complete

k = 2k = 3k = 4k = 5k = 6k = 7k = 8

Page 40: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 40 of 42

Class Assignments

• Assignment 1 – Drawing Trees• Assignment 2 – Phylogenetic Distance Graphs• Assignment 3 – Phylogeny Reconstruction

(Tested on CS10051 students, Spring 2006)

Page 41: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 41 of 42

Future Work

• Additional bioinformatics class assignments• Atlas of Phylogenetic Distance Graphs• Implement the Phylogeny Reconstruction

Simulator using NetworkX• Remove redundant node and isomorphic

approximation assumptions• Apply to all nodes in tree instead of just the

leaves

Page 42: Phylogeny Reconstruction from Experimental Data Dean L. Zeller Dr. F. F. Dragan, advisor Kent State University April 7 th, 2006

Phylogeny Reconstruction Slide 42 of 42

References[Br05a] Brandstädt, A., V.B. Le, and R. Sritharan (2005). “Structure and Linear Time Recognition of

4-Leaf Powers”, Unpublished manuscript.[Br05b] Brandstädt, A. and V. B. Le (2005). “Structure and Linear Time Recognition of 3-Leaf

Powers”, Unpublished manuscript. [Fe04] J. Felsenstein (2004). Inferring Phylogenies, Sinauer Associates, Inc.[Ga97] L. Gąsieniec, J. Jansson, A. Lingas, and A. Östlin (1997), “On the complexity of computing

evolutionary trees,” Proceedings of Computing and Combinatonics Third Annual International Conference COCOON ’97, Shanghai, China, pp. 134 to 145, Aug 97.

[Ka99] Y. Kao, A. Lingas, and A. Östlin (1999), “Balanced Randomized Tree Splitting with Applications to Evolutionary Tree Constructions,” Proceedings of the 16th Annual Symposium on Theoretical Aspects of Computer Science, Trier, Germany, pp. 184 to 196, March 1999.

[Li99] A. Lingas, H. Olsson, and A. Östlin (1999), “Efficient Merging, Construction, and Maintenance of Evolutionary Trees,” Proceedings of the 26th International Colloquium on Automata, Languages, and Programming (ICALP) ’99, Prague, Chech Republic, pp. 544 to 553, July 1999.

[Re99] Read, R.C. and R.J. Wilson (1999). An Atlas of Graphs, Oxford Science Publications.[Wu04] Wu, B.Y. and K.M. Chao (2004). Spanning Trees and Optimization Problems. Chapman &

Hall/CRC.