2015-10-131 phylogentic tree. 2015-10-132 evolution evolution of organisms is driven by diversity ...

48
22/3/22 1 Phylogentic Tree

Upload: lorena-peters

Post on 31-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

23/4/19 1

Phylogentic Tree

23/4/19 2

EvolutionEvolution of organisms

is driven by Diversity

Different individuals carry different variants of the same basic blue print

Mutations The DNA sequence

can be changed due to single base changes, deletion/insertion of DNA segments, etc.

23/4/19 3

Basic Assumptions

Closer related organisms have more similar genomes.

Highly similar genes are homologous (have the same ancestor).

A universal ancestor exists for all life forms. Phylogenetic relation can be expressed by a

dendrogram (a “tree”) .

23/4/19 4

phylogenetic tree

phylogenetic tree is a tree that describes the sequence of speciation events that lead to the forming of a set of current day species;

23/4/19 5

Ancestral Node or ROOT of

the TreeInternal Nodes

Branches or Lineages

Terminal Nodes

A

B

C

D

E

Common Phylogenetic Tree Terminology

23/4/19 6

Phylogenetic trees diagram the evolutionary relationships between the taxa

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

No meaning to thespacing between thetaxa, or to the order inwhich they appear fromtop to bottom.

This dimension either can have no scale, can be proportional to genetic distance or amount of change (for ‘phylograms’ or ‘additive trees’), or can be proportionalto time.

These say that B and C are more closely related to each other than either is to A,and that A, B, and C form a clade that is a sister group to the clade composed ofD and E. If the tree has a time scale, then D and E are the most closely related.

23/4/19 7

Historical Note Until mid 1950’s phylogenies were constructed

by experts based on their opinion (subjective criteria)

Since then, focus on objective criteria for constructing phylogenetic trees Thousands of articles in the last decades

Important for many aspects of biology Classification Understanding biological mechanisms

23/4/19 8

Morphological vs. Molecular Classical phylogenetic analysis:

morphological features: number of legs, lengths of legs, etc.

Modern biological methods allow to use molecular features Gene sequences Protein sequences

Analysis based on homologous sequences in different species

23/4/19 9

Morphological topology

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

23/4/19 10

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

From sequences to a phylogenetic tree

There are many possible types of sequences to use.

23/4/19 11

DonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Mitochondrial (线粒体 ) topology(Based on Pupko et al.,)

23/4/19 12

What can we get from phylogenetic trees?

A few examples of what can be inferred from phylogenetic trees built from DNAor protein sequence data: Which species are the closest living relatives of modern

humans?

Did the infamous Florida Dentist infect his patients with HIV?

23/4/19 13

Which species are the closest living relatives of modern humans?

Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show that bonobos and chimpanzees are related more closely to humans than either are to gorillas.

MYA

Chimpanzees

Orangutans

Humans

Bonobos

Gorillas

014

23/4/19 14

Did the Florida Dentist infect his patients with HIV?

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient A

Patient G

Patient BPatient E

Patient A

Local control 2

Local control 3

Local control 9

Local control 35

Local control 3

Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.

No

No

Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:

23/4/19 15

Types of treesUnrooted tree represents the same phylogeny

without the root node

23/4/19 16

Rooted versus unrooted treesTree A

ab

Tree B

c

Tree C

Represents the three rooted trees

23/4/19 17

Inferring evolutionary relationships between the taxa requires rooting the tree:

To root a tree mentally, imagine that the tree is made of string. Grab the string at the root and tug on it until the ends of the string (the taxa) fall opposite the root: A

BC

Root D

A B C D

RootNote that in this rooted tree, taxon A is no more closely related to taxon B than it is to C or D.

Rooted tree

Unrooted tree

23/4/19 18

Now, try it again with the root at another position:

A

BC

Root

D

Unrooted tree

Note that in this rooted tree, taxon A is most closely related to taxon B, and together they are equally distantly related to taxa C and D.

C D

Root

Rooted tree

A

B

23/4/19 19

An unrooted, four-taxon tree theoretically can be rooted in five different places to produce five different rooted trees

The unrooted tree 1:

A C

B D

Rooted tree 1d

C

D

A

B

4

Rooted tree 1c

A

B

C

D

3

Rooted tree 1e

D

C

A

B

5

Rooted tree 1b

A

B

C

D

2

Rooted tree 1a

B

A

C

D

1

These trees show five different evolutionary relationships among the taxa!

23/4/19 20

x

CA

B D

A D

B E

C

A D

B E

C

F

Each unrooted tree theoretically can be rooted anywhere along any of its branches

N (2N - 5)!/(2N - 3 (N - 3)!) (2N - 3)!/(2N - 2 (N - 2)!)

23/4/19 21

By outgroup: Uses taxa (the “outgroup”) that are known to fall outside of the group of interest (the “ingroup”). Requires some prior knowledge about the relationships among the taxa.

There are two major ways to root trees:

A

B

C

D

10

2

3

5

2

By midpoint or distance:Roots the tree at the midway point between the two most distant taxa in the tree, as determined by branch lengths. This assumption is built into some of the distance-based tree building methods.

outgroup

d (A,D) = 10 + 3 + 5 = 18Midpoint = 18 / 2 = 9

23/4/19 22

Two Methods of Tree Construction

Distance- A tree that recursively combines two nodes of the smallest distance.

Parsimony – A tree with a total minimum number of character changes between nodes.

23/4/19 23

Types of data used in phylogenetic inference:Character-based methods: Use the aligned characters, such as DNA

or protein sequences, directly during tree inference. Taxa Characters

Species A ATGGCTATTCTTATAGTACGSpecies B ATCGCTAGTCTTATATTACASpecies C TTCACTAGACCTGTGGTCCASpecies D TTGACCAGACCTGTGGTCCGSpecies E TTGACCAGTTCTCTAGTTCG

Distance-based methods: Transform the sequence data into pairwise distances (dissimilarities), and then use the matrix during tree building.

A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ----

23/4/19 24

Distance-Based Method

Input: distance matrix between speciesFor two sequences si and sj , perform a pairwise (global)

alignment. Let f = the fraction of sites with different residues. Then

Outline: Cluster species together Initially clusters are singletons At each iteration combine two “closest” clusters

to get a new one

3 4log(1 )

4 3ijd f (Jukes-Cantor Model)

23/4/19 25

Unweighted Pair Group Method using Arithmetic Averages (UPGMA)

UPGMA is a type of Distance-Based algorithm

UPGMA steps:.

1. Cluster the two species with the smallest distance putting them into a single group.

2. Recalculate the distance matrix with the new group against other groups:

3. With the new distance matrix repeat 1 until all species have been grouped.

Algorithm

23/4/19 26

23/4/19 27

UPGMA Step 1Species A B C D

B 9 – – –

C 8 11 – –

D 12 15 10 –

E 15 18 13 5

Merge D & E

D E

Species A B C

B 9 – –

C 8 11 –

DE 13.5 16.5 11.5

d(DE)A = 0.5 * (dDA+dEA) = 0.5*(12+15) = 13.5 d(DE)B = 0.5 * (dDB+dEB) = 0.5*(15+18) = 16.5d(DE)C = 0.5 * (dDC+dEC) = 0.5*(10+13) = 11.5

23/4/19 28

UPGMA Step 2Merge A & C

D E

Species A B C

B 9 – –

C 8 11 –

DE 13.5 16.5 11.5

A C

Species B AC

AC 10 –

DE 16.5 12.5

23/4/19 29

UPGMA Steps 3 & 4Merge B & AC

D EA C

Species B AC

AC 10 –

DE 16.5 12.5B

Merge ABC & DE

D EA C B

(((A,C)B)(D,E))

23/4/19 30

Optimality criterion: The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid replacements) to explain the sequences.

Parsimony-score:Number of character-changes (mutations) along the evolutionary tree

Example:

Most Parsimonious Tree (MP

Tree)

AGA AAAAAG GGA

1 1 0 2

0 0

1 0 0 1

0 1AAA

AAA AAAAGAAAAAAG GGA

AAA

AAA AGA

Most parsimonious tree: Tree with minimal parsimony score

Score = 4 Score = 3

23/4/19 31

We cannot go over all the trees. We will try to find a way to find the best tree.There are approximate solutions… But what if we want to make sure we find the global maximum.

There is a way more efficient than just go over all possible tree. It is called BRANCH AND BOUND and is a general technique in computer science, that can be applied to phylogeny.

There are many trees..,

23/4/19 32

BRANCH AND BOUND

To exemplify the BRANCH AND BOUND (BNB) method, we will use an example not connected to evolution. Later, when the general BNB method is understood, we will see how to apply this method to finding the MP tree. We will present the traveling sales person path problem (TSP).

23/4/19 33

Branch and Bound for TSP

Find a minimum cost round-trip path that visits each intermediate city exactly once

Greedy approach:A,G,E,F,B,D,C,A= 251

AC

F

E

D

G

B

93

46

20

35

68

1257 31

15

82

17

8259

23/4/19 34

Search all possible pathsA

C

F

E

D

G

B

93

46

20

35

68

1257 31

15

82

17

8259

AC

F

E

D

G

B

93

46

20

35

68

1257 31

15

82

17

8259

All paths

AG (20) AB (46) AC (93)

AGF (88) AGE (55)

AGFB AGFE AGFC

ACB (175) ACD ACF

ACBE (257)

Best estimate: 251

23/4/19 35

Back to finding the MP tree

Finding the MP tree

BNB helps, though it is still exponential…

23/4/19 36

The MP search tree1

2

34 is added to branch 1.

1

2

34

1

2

34

1

2

3

4

5 is added to branch 2.There are 5 branches

23/4/19 37

The MP search tree

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

23/4/19 38

MP-BNB

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Best (minimum) value = 52

23/4/19 39

MP-BNB

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Best record = 52

23/4/19 40

MP-BNB

4 is added to branch 1.

30

43 39

52 54 52 53 58 61 56 59 61 69 53 51 42 47 47

55

Best record = 52

23/4/19 41

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52

23/4/19 42

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52

23/4/19 43

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51

53 58

23/4/19 44

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51 42

23/4/19 45

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51 42

23/4/19 46

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best record = 52 51 42

23/4/19 47

MP-BNB

30

43 39

52 54 52 53 58 53 51 42 47 47

55

Best TREE.MP score = 42

Total # trees visited: 14

23/4/19 48

Order of Evaluation Matters

30

43 39

53 51 42 47 47

55

Evaluate all 3 first

Total tree visited: 9

The bound after searching this subtree will be 42.