combining genes in phylogeny and how to test phylogeny methods … tal pupko department of cell...

47
Combining genes in phylogeny And How to test phylogeny methods… Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel-Aviv University [email protected]

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Combining genes in phylogeny

And

How to test phylogeny methods…

Tal Pupko

Department of Cell Research and Immunology, George S. Wise Faculty of

Life Sciences, Tel-Aviv University

[email protected]

Page 2: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple sequence alignment (vWF)

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Page 3: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Rat QEPGGLVVPPTDA

Rabbit QEPGGMVVPPTDA

Gorilla QEPGGLVVPPTDA

Cat REPGGLVVPPTEG

VWF

From sequences to a phylogenetic tree

Page 4: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple multiple sequence alignment

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Rat QEPGGLVVPPTDAPVSSTTPYVEDTPEPPLHNFYCSK

Rabbit QEPGGMVVPPTDAPVRSTTPYMEDTPEPPLHDFYWSN

Gorilla QEPGGLVVPPTDAPVSPTTLYVEDISEPPLHDFYCSR

Cat REPGGLVVPPTEGPVRATTPYVEDTPESTLHDFYCSR

Page 5: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Murphy et al. (2001b)19 nuclear genes + 3 mitochondrial genes (16,400 bp)

Phylogenetic studies are now

based on the analysis

of multiple genes

Page 6: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Consensus trees

Page 7: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Consensus tree

A consensus tree summarizes information common to two or more trees.

b c d eab c d eab c d ea

Page 8: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Strict consensus

Strict consensus includes only those groups that occur in all the trees being considered.

b c d eab c d ea

b c d ea

b c d ea

Strict consensus

Page 9: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Strict consensus

Problem: the split {ab} is found 2 out of 3 times, and this is not shown in the strict consensus.

b c d eab c d ea

b c d ea

b c d ea

Strict consensus

Page 10: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Majority-rule consensus

Majority-rule consensus: splits that are found in the majority of the trees are shown.

b c d eab c d ea

b c d ea

b c d ea

Majority-rule consensus

Page 11: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Majority-rule consensus

The percentage of the trees supporting each splits are indicated

b c d eab c d ea

b c d e

100

b c d ea

Majority-rule consensusa

67

67

Page 12: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Problem with Majority-rule consensus

However in both trees if we consider only {b,c,d}, then in both trees b is closer to c than b to d, or c to d.

b c d e

b c d ae

Majority-rule consensus=Strict consensus = a

b c d ea

Page 13: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Adams consensus

Adams consensus will give the subtrees that are common to all trees. Adams consensus is useful where there is one or more sequences with unclear positions but there’s a subset of sequences that are common to all trees.

b c d ae

Adams consensus=

b c d ea

c d a eb

Page 14: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Networks

A network is sometimes used to represent tree in which recombination occurred.

b c d ea

Page 15: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

t1t3

t2

A

C

XS

}{

321 )()()()( AAX

XSXCXA rtPrtPrtPXPrDataP

Maximum Likelihood

Page 16: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisconcatenate analysis

Sp1Sp2

Sp3Sp4

e.g., Murphy et al. (2001)

Gene 1 + Gene 2 + Gene 3Sp1: TCTGT…AACTCTTT…GAATCGTT…GCCSp2: TCTGC…GACTCGCT…GGAACGCT…CCCSp3: CTTAT…GATCTATT…GGAATATT…CGASp4: CCTAT…GATCCATT…GGACCATT…CCA

Evolutionarymodel

Page 17: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisconcatenate analysis

Sp1: TCTTT…GAASp2: TCGCT…GGASp3: CTATT…GGASp4: CCATT…GGA

Gene 2Sp1: TCTGT…AACSp2: TCTGC…GACSp3: CTTAT…GATSp4: CCTAT…GAT

Gene 1Sp1: TCGTT…GCCSp2: ACGCT…CCCSp3: ATATT…CGASp4: CCATT…CCA

Gene 3

e.g., Murphy et al. (2001)

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Evolutionarymodel

Evolutionarymodel

Evolutionarymodel

Page 18: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Branch lengths correspond to

evolutionary distance:

d = AA replacements/site=

[AA

replacements/(site*year)]*year=

Evolutionary rate * year

What are branch lengths

Page 19: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisseparate analysis

Sp1: TCTTT…GAASp2: TCGCT…GGASp3: CTATT…GGASp4: CCATT…GGA

Gene 2Sp1: TCTGT…AACSp2: TCTGC…GACSp3: CTTAT…GATSp4: CCTAT…GAT

Gene 1Sp1: TCGTT…GCCSp2: ACGCT…CCCSp3: ATATT…CGASp4: CCATT…CCA

Gene 3

e.g., Nikaido et al. (2001)

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Evolutionarymodel2

Evolutionarymodel1

Evolutionarymodel3

Page 20: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisNumber of parameters

Separateanalysis

Concatenateanalysis

Number of species = nNumber of gene = gNumber of parameters in the model = m

Number ofparameter m+(2n-3) g*(m+(2n-3))

Examplen= 44 ; g = 22

m = 085 1870

Page 21: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisNumber of parameters

Both oversimplified model and over-parameterization may lead to the wrong phylogenetic conclusions

Page 22: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisproportional analysis

Sp1: TCTTT…GAASp2: TCGCT…GGASp3: CTATT…GGASp4: CCATT…GGA

Gene 2Sp1: TCTGT…AACSp2: TCTGC…GACSp3: CTTAT…GATSp4: CCTAT…GAT

Gene 1Sp1: TCGTT…GCCSp2: ACGCT…CCCSp3: ATATT…CGASp4: CCATT…CCA

Gene 3

Evolutionarymodel2

Evolutionarymodel1

Evolutionarymodel3

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Sp1Sp2

Sp3Sp4

Rate=1 Rate=0.5 Rate=1.5

Page 23: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Multiple genes analysisNumber of parameters

Separateanalysis

Concatenateanalysis

Number of species = nNumber of gene = gNumber of parameters in the model = m

Number ofparameter m+(2n-3) g*(m+(2n-3))

Proportionalanalysis

g-1+gm+(2n-3)

Examplen= 44

g = 22m = 0

85 1870 106

Page 24: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Aims of our studyTo compare 3 types of multiple-genes analysis: Concatenate analysis Separate analysis Proportional analysis

3 protein datasets: Mitochondrial data set [56 species, 12 genes] Nuclear dataset (“short genes”) [46 species, 6 genes] Nuclear dataset (“long genes”) [28 species, 4 genes]

(Short genes- based on Murphy dataset)

Page 25: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Comparing topologies

BonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboonWhite-fronted capuchinSlow lorisTree shrewJapanese pipistrelleLong-tailed batJamaican fruit-eating batHorseshoe bat

Little red flying foxRyukyu flying foxMouseRatVoleCane-ratGuinea pigSquirrelDormouseRabbitPikaPigHippopotamusSheepCowAlpacaBlue whaleFin whaleSperm whaleDonkeyHorseIndian rhinoWhite rhinoElephantAardvarkGrey sealHarbor sealDogCatAsiatic shrewLong-clawed shrewSmall Madagascar hedgehogHedgehogGymnureMoleArmadilloBandicootWallarooOpossumPlatypus

Archonta

Glires

Ungulata

Carnivora

Insectivora

Xenarthra

(Based on Mc Kenna and Bell, 1997)

Morphological topology

Page 26: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Mitochondrial topologyDonkeyHorseIndian rhinoWhite rhinoGrey sealHarbor sealDogCatBlue whaleFin whaleSperm whaleHippopotamusSheepCowAlpacaPigLittle red flying foxRyukyu flying foxHorseshoe batJapanese pipistrelleLong-tailed batJamaican fruit-eating bat

Asiatic shrewLong-clawed shrew

MoleSmall Madagascar hedgehogAardvarkElephantArmadilloRabbitPikaTree shrewBonoboChimpanzeeManGorillaSumatran orangutanBornean orangutanCommon gibbonBarbary apeBaboon

White-fronted capuchinSlow lorisSquirrelDormouseCane-ratGuinea pigMouseRatVoleHedgehogGymnureBandicootWallarooOpossumPlatypus

Perissodactyla

Carnivora

Cetartiodactyla

Rodentia 1

HedgehogsRodentia 2

Primates

ChiropteraMoles+ShrewsAfrotheria

XenarthraLagomorpha

+ Scandentia

Aims of our study

Page 27: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Nuclear topology

Aims of our study

Round Eared Bat

Flying Fox

Hedgehog

Mole

Pangolin

Whale

Hippo

Cow

Pig

Cat

Dog

Horse

Rhino

Rat

Capybara

Rabbit

Flying Lemur

Tree Shrew

Human

Galago

Sloth

Hyrax

Dugong

Elephant

Aardvark

Elephant Shrew

Opossum

Kangaroo

1

2

3

4

Cetartiodactyla

Afrotheria

Chiroptera

Eulipotyphla

Glires

Xenarthra

CarnivoraPerissodactyla

Scandentia+Dermoptera

Pholidota

Primate

(Madsenl tree)

Page 28: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Comparing different models using AKAIKE INFORMATION CRITERION

PLAIC 2log2

A model which minimizes the AIC is considered to be the most appropriate model.

Page 29: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Results: the best multiple gene analysis

The proportional analysis is the best for the mitochondrial dataset

Separateanalysis

Concatenateanalysis

Proportionalanalysis

df

Ln(L)

AIC

-90999.30

182262.60

-89921.78

182483.55

-91188.71

182619.42

1321320 121

(Mitochondrial tree, N-Gamma rate model)

Page 30: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Results: the best multiple gene method

Separateanalysis

Concatenateanalysis

Proportionalanalysis

df

Ln(L)

AIC

-11543.87

23287.74

-11192.12

23464.23

-11618.67

23427.33

100540 95

(Murphy dataset, Madsenl tree, N-Gamma rate model)

The Proportional analysis is the best for the Nuclear dataset (“Short genes”)

Page 31: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

The Separate analysis is the best for the Nuclear dataset (“Long genes”)

Results: the best multiple gene method

Separateanalysis

Concatenateanalysis

Proportionalanalysis

df

Ln(L)

AIC

-31406.81

62933.63

-31153.28

62738.56

-31519.10

63152.21

60216 57

(Madsen dataset, Murphyl tree, N-Gamma rate model)

Page 32: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Conclusion: the best multiple gene method

1- The concatenate model is always the worst way to analyze multiple genes.

2- Selecting between the separate analysis or the proportional analysis depends on the data considered:

The proportional model is more adapted for short genes, the separate model for longer sequences

Page 33: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Results: mammalian phylogeny

The morphological tree is always rejected

P(K-H test) < 0.05

• whatever the model used

• whatever the dataset

Page 34: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Results: mammalian phylogeny

• The mitochondrial tree is the best tree for the mitochondrial dataset. But we cannot reject the nuclear tree.

• The nuclear tree is the best for the nuclear datasets, and we can reject the mitochondrial tree.

Conclusion (Topology): It seems that the nuclear tree is the best tree among the 3 alternative trees.

Page 35: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Modelisation of site rate variation

The gamma distribution:

F(t+x) =

(1/n).F(t).P(x.Rn)c

n=1

Homogenous model:

F(t+x) = F(t).P(x)

Gamma model:

Sit

e p

roport

ions

f(r)

Substitution rates (R)

Page 36: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

A

C

Gd1

d3

d2

Continuous

A

C

Gd1

d3

d2

Discrete

Likelihoods with rate variation

r X

XCXAXG drrfrdPrdPrdPXpDp )()()()()()( 321

irr

iX

XCXAXG rprdPrdPrdPXpDp )()()()()()( 321

Page 37: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Results: the best site-rate variation model

Mitochondrial data set(Mitochondrial tree, proportional analysis)

Homogenousmodel

1-Gammamodel

N-Gammamodel

df

Ln(L)

AIC

-90999.30

182262.60

-98998.68

198237.37

-91094.30

182430.61

132120 121

Page 38: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Conclusion: the best site-rate variation model

The N-Gamma model is always the best site-rate variation model.

Page 39: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Combining Multiple Genes 

Dorothee Huchon (Florida State University)Masami Hasegawa (Institute of Statistical Mathematics)Norihiri Okada (Tokyo Institute of Technology)Ying Cao (ISM).

Collaborations

Page 40: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

Page 41: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

Best way to test different methods of phylogenetic reconstruction is on trees that are known to be true from other resources…

Problem: known phylogenies are very rare.

Known phylogeny: laboratory animals, crop plants (and even those are often suspect). Also their evolutionary rate is very small…

Page 42: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

David Hillis and colleagues have created “experimental” phylogenies in the lab.

Page 43: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

They have used bacteriophage T7 and subdivided cultures of it, in the present of a mutagen. They then sequenced a marker gene from the final cultures and gave the sequences as input to few phylogenetic methods. The output of the tree building methods was compared to the true tree.

Page 44: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

In fact, they used restriction sites method to infer the phylogeny, using MP, NJ, UPGMA and others.

All methods reconstructed the true tree.

Page 45: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

They also compared outputs of ancestral sequence reconstruction, using MP.

97.3% of the ancestral states were correctly reconstructed.

Encouraging!

Page 46: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Known phylogenies

Criticism: (1) The true tree was very easy to infer, because it was well balances, and all nodes are accompanied by numerous changes.

(2) The mutations by a single mutagen do not reflect reality.

Page 47: Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,

Thank You…