ab 10 phylo - biotec · by michael schroeder, biotec 21 unrooted and rooted trees rooted tree with...

34
Michael Schroeder Biotechnology Center TU Dresden Phylogenetic tree

Upload: others

Post on 02-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Michael Schroeder Biotechnology CenterTU Dresden

Phylogenetic tree

Page 2: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Phylogenetic trees

•  Motivation

•  Rooted and unrooted trees

•  Rooted trees: Hierarchical clustering

•  Drawing trees

•  Unrooted trees: Neighbour joining

Page 3: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Origin of mitochondria in eucaryotes? Sequence comparison (Blast) of 601 mitochondrial yeast genes

to bacteria and archaea

Page 4: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Origin of mitochondria in eucaryotes? Sequence comparison (Blast) of 601 mitochondrial yeast genes

to bacteria and archaea

Bacteria

Horiike et al. Nat Cell Biol. 2001. Adapted from Campbell and Heyer. Discovering genomics, proteomics, bioinformatics.

Archaea

Page 5: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Darwin‘s Tree of Life

5

Page 6: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Tree of Life with 2.3 Mio Species

opentreeoflife.org 6

Page 7: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Phylogeny

§  Taxonomists classify and group organisms

§  Aristoteles, De Partibus Animalium §  …discuss each separate species – man, lion, ox,… §  or … deal first with the attributes which they have

in common…

Page 8: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Schools of Taxonomists

§  Goal: create taxonomy §  Approach:

§  Phenotype §  Phylogeny

§  3 schools:

§  Phenotype only §  Evolutionary Taxonomists:

Phenotype (+ Phylogeny) §  Cladists:

Phylogeny (+Phenotype)

Page 9: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Westnile virus in New York

Page 10: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

When did homo sapiens leave Africa?

§  Recent-Africa Hypothesis: hundred(s) thousand years §  Multi-regional Hypothesis: million(s) years

Page 11: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

§  53 humans §  Outgroup

chimpanzee

Page 12: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Clustal W: over 50 000 citations

Page 13: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Thompson, NAR, 1994

ClustalW uses phylogenetic trees as guide trees for multiple sequence alignment

Page 14: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Phylogenetic trees

•  Motivation

•  Rooted and unrooted trees

•  Rooted trees: Hierarchical clustering

•  Drawing trees

•  Unrooted trees: Neighbour joining

Page 15: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Topixgallery.com

Page 16: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Bifurcating Trees

A B C D

Edge or Branch

Ancestral node (root) Internal node

(hypothetical ancestor)

Terminal node (leave)

Genes, Proteins, Populations, Species,...

Bifurcating = two decendants

Page 17: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Unrooted and Rooted Trees

The principal uses of these numbers will be ... to frighten taxonomists.

Page 18: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Unrooted and Rooted Trees

A B C

A C B

B C A

B

C

A

Page 19: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

A

B

C

D

A B

C D

A B

C D

A B C D

A C B D B C A D C A B D D A B c

A D B C A D B C B D A C C B A D D B A C

A B C D B A C D C D A B D C A B

A C B D

Unrooted and Rooted Trees

Page 20: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

By Michael Schroeder, Biotec 20

Unrooted and Rooted Trees 8.200.794.532.637.891.559.375 unrooted trees for 20 leaves!

To get a feeling: 8.200.794.532.637.891.559.375 ms is 20 times longer than the universe exists

Felsenstein, 1978

Page 21: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

By Michael Schroeder, Biotec 21

Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2 internal nodes and 2m-3 edges Let Tunroot (m) be the number of unrooted trees with m leaves Given an unrooted tree with m leaves, an extra leaf can be added to any of the 2m-3 edges to make a tree with m+1 leaves Tunroot(m+1)=(2m-3) Tunroot(m) This is satisfied by Tunroot (m)=(2m-5)!! Double factorial = Factorial leaving out every other number

Felsenstein, 1978

Page 22: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Consequence:

Algorithms that generate all trees, judge them, and pick the best

cannot work, as there are too many trees.

Alternatives:

Hierarchical clustering and

Neighbour joining

Page 23: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Phylogenetic trees

•  Motivation

•  Rooted and unrooted trees

•  Rooted trees: Hierarchical clustering

•  Drawing trees

•  Unrooted trees: Neighbour joining

Page 24: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Hierarchical clustering §  Input: Pairwise distances between sequences §  Output: A tree of clusters of sequences

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E A B C D E

Page 25: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

A B C D E A 2 6 10 9 B 5 9 8 C 4 5 D 3 E

(A,B) C D E (A,B) 5 9 8 C 4 5 D 3 E

A B

Hierarchical clustering

Page 26: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

(A,B) C (D,E) (A,B) 5 8 C 4 (D,E)

(A,B) C D E (A,B) 5 9 8 C 4 5 D 3 E

A B D E A B

Hierarchical clustering

Page 27: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

(A,B) C (D,E) (A,B) 5 8 C 4 (D,E)

A B C D E

(A,B) (C,(D,E))

(A,B) 5 (C,(D,E))

A B D E

Hierarchical clustering

Page 28: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

((A,B),(C,(D,E)))

((A,B),(C,(D,E)))

A B C D E

(A,B) (C,(D,E))

(A,B) 5 (C,(D,E))

A B C D E

Hierarchical clustering

Page 29: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

const m number of original sequences var U a set of current trees, initially, one tree for each original sequence. D The distance between the trees in U begin U = the set of one tree (each of one node) for each original sequence. while |U| >1 do (u,v) = the roots of two trees in U with the least distance in D Make a new tree with root w and with u and v as children Calculate the length of the edges (v, w) and (u, w) for each root x of the trees in U-{u, v} do D(x, w) = calculate the distance between x and the new node (w) end U = (U - {u,v} ) ∪ {w} update U end end

Algorithm

Page 30: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Hierarchical Clustering

Distance to the new cluster w = (u,v) §  Single linkage:

§  D(x,w) = min { D(x,u), D(x,v) } §  Example: Distance (A,B) to C is 1

§  Complete linkage: §  D(x,w) = max { D(x,u), D(x,v) } §  Example: Distance (A,B) is C is 2

§  Average linkage (WPGMA) (weighted pair group method with arithmetic mean)):

§  D(x,w) = ( D(x,u) + D(x,v) ) / 2 §  Example: Distance (A,B) to C is 1.5

§  More general (UPGMA) (unweighted pair group method using arithmetic mean):

§  D(x,w) = ( mu D(x,u) + mv D(x,v) ) / (mu + mv ) §  mu is the number of nodes in the subtreee u

By Michael Schroeder, Biotec 30

Question: What’s the difference between

UPGMA and WPGMA?

Note: “weighted” because u and v may have different number of nodes, hences

they are weighted.

C

1 B

2 1 A

C B A

Page 31: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Are WPGMA and UPGMA the same?

§  Subtree D has 1000 nodes (mD =1000) §  Subtree E has 1 node (mE =1)

§  Distance (D,E) to F is §  ( 2 + 98)/ 2 = 50 for WPGMA §  (1000*2 + 1*98)/(1000+1) = 2.09 for UPGMA

F

98 E

2 1 D

F E D

Page 32: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

UPGMA Unweighted pair group method using arithmetic mean

A B C D E A 3 7 8 10 B 6 8 7 C 4 5 D 6 E

(A,B) C D E (A,B) 6.5 8 8.5 C 4 5 D 6 E

(A,B) (C,D) E (A,B) 7.25 8.5 (C,D) 5.5 E

(A,B) (C,D),E)

(A,B) 7.67

(C,D),E)

UPGMA: (2*7.25+1*8.5) / 3 = 7.67 WPGMA: (7.25+8.5) / 2 = 7.825

Page 33: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Does linkage method change trees?

By Michael Schroeder, Biotec 33

A B C D A 1 2 5 B 4 5 C 3 D

A B C D A B C D

Page 34: AB 10 phylo - Biotec · By Michael Schroeder, Biotec 21 Unrooted and Rooted Trees Rooted tree with m leaves has m-1 internal nodes and 2m-2 edges Unrooted tree with m leaves has m-2

Summary §  Applications of phylogenetic trees §  Clustal W

§  Hierarchical clustering §  Linkage methods