phylogenomic convergence detection - evolutionary biology meeting in marseille 2014

32
Convergence for everyone? Detecting genomic adaptive convergence: Initial results, lessons & perspectives 16th September 2014 Joe Parker, Queen Mary University London

Upload: joe-parker

Post on 03-Aug-2015

61 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Convergence for everyone? Detecting genomic adaptive convergence:

Initial results, lessons & perspectives

16th September 2014

Joe Parker, Queen Mary University London

Page 2: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Adaptive molecular convergence

•  Background & definition •  Site-based methods •  Tree-based methods •  Combined approaches •  Sampling / phylogenies as parameters •  Future

Page 3: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Lab Interests

•  Ecology and evolution of traits •  Echolocation, sociality •  NGS data for population genetics and phylogenomics

Page 4: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Defining molecular convergence

•  It isn’t: –  Divergence (adaptive or neutral) –  Conservation or purifying selection –  Retention of ancestral states with secondary

changes in outgroups –  ‘Neutral’ homoplasy

•  It ought to be: –  ‘Adaptive’ homoplasies –  ‘Excess’ homoplasies

Page 5: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Prestin

•  Gene phylogeny recovers (paraphyletic) mammalian echolocators as monophyletic1 •  Functional convergence of parallel changes N7T & I384T demonstrated in vitro2

1Liu et al. (2010) Curr. Biol. 20:R53; 2Liu et al. (2014) MBE 31(9):2415

Page 6: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Methods

Page 7: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Methods

•  Species phylogeny & inputs •  Selection detection •  Site-based convergence detection •  Tree-based convergence detection

Page 8: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

•  Look at tips

Site-based methods

Page 9: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Site-based methods •  Look at tips •  Reconstruct ancestral changes

??? ??

Page 10: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Lysozyme •  Convergent and parallel

substitutions in stomach lysozymes of advanced ruminants

•  Parsimony (‘over-estimate’) and Bayesian (‘under-estimate’) methods

Zhang & Kumar (1997) MBE 14:527

Page 11: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Site-based methods •  Look at tips •  Reconstruct ancestral changes

•  Pairwise (conv) ∝ (div) changes

•  BEB posterior probabilities

P(conv|data), P(div|data)

Castoe et al. (2009) PNAS 106(22):8986

Page 12: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Tree-based methods

Page 13: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Tree-based methods

•  de novo tree search –  Inference error – Signal : noise – Multiple phylogenies

Page 14: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Tree-based methods

∆SSLSnull - alternative (likelihood support comparison)

Page 15: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Convergence in echolocating mammals

•  22 mammals, 2326 loci, ~600,000 sites

•  Convergence signals across genome

•  Loci linked to sensory perception

Parker et al. (2013) Nature 502:228

details see Supplementary Fig. 1 and Methods): H1 corresponds to allecholocatingbats in amonophyletic group (‘bat–bat convergence’) andH2 to all echolocating mammals together in a monophyletic group(‘bat–dolphin convergence’). Using this approach we obtained theSSLS values of all amino acids under three different tree topologies.Thedifference in SSLS for a single site under the species tree and a givenconvergent tree with an identical substitution model denotes the rela-tive support for the convergence hypothesis; for example, DSSLS(H1)5 SSLS (H0)2 SSLS (H1) (where negativeDSSLS implies supportfor convergence; see Supplementary Fig. 2).Wequantified the extent ofsequence convergence at each locus by taking the mean of its DSSLSvalues, and found 824 loci with mean support for H1 and 392 for H2.Using simulationswe confirmed that these convergent signalswere notdue toneutral processes andwere robust to the substitutionmodel used(see Supplementary Methods).We ranked the mean DSSLS for all 2,326 loci under both conver-

gence hypotheses and, to assess theperformance of ourmethod, inspectedthe rank positions of seven hearing genes that have previously beenshown to exhibit convergence and/or adaptation in echolocatingmam-mals: prestin (Slc26a5), Tmc1, Kcnq4 (Kqt-4), Pjvk (Dfnb9), otoferlin,Pcdh15 and Cdh23 (see Methods). Prestin was ranked 43rd (H1) and22nd (H2), whereas several other loci were also ranked highly in thedistribution of convergence support values (see Fig. 1b). In addition tothese,wealso found several otherhearinggenes in the top5%supporting

H1 (Itm2b, Slc4a11) and H2 (Coch, Itm2b, Ercc3 and Opa1). Becausebats and cetaceans are also known to have undergone shifts in spec-tral tuning and other adaptations in response to living in low lightenvironments26–28, we also examined the position of genes implicatedin vision and found four such loci in the top 5%of genes supportingH1(Lcat, Slc45a2, Rabggtb and Rp1) and three supporting H2 (Jmjd6, Sixand Rho; see examples in Fig. 1b and Supplementary Tables 2 and 3).We tested statisticallywhether the strength of sequence convergence

among echolocating bats, and between echolocating bats and thebottlenose dolphin, is greater in hearing genes than in other genes(for locus selection, see Methods). For each phylogenetic hypothesis,we averaged the mean DSSLS values of all 21 genes in our data set thatare listed as linked to either hearing and/or deafness in any taxon basedon published functional annotations (see Supplementary Informa-tion). By comparing our observed values to null distributions of cor-responding values obtained by randomization, we found that hearinggenes had significantly more negative average values than expected bychance for bat–dolphin convergence (H2: z520.0194, P, 0.05). Werepeated this method for 75 genes listed as involved in vision and/orblindness, and found support, althoughweaker, in both cases of pheno-typic convergence (z520.0020,P# 0.055 and z520.0097,P# 0.09).Loci previously reported to have association with echolocation hadstrong support by randomization for both hypotheses (P# 0.01 inboth cases).

Lcat**Pcdh15**

Itm2b**

Hypothesis H1 (‘bat–bat convergence’)

Hypothesis H2 (‘bat–dolphin convergence’)

Hypothesis H0 (species tree)

Echolocating bats anddolphin

Euarchontoglires

Chiroptera; Yinpterochiroptera(echolocating and non-echolocating)

Chiroptera; Yangochiroptera(all echolocating)

Laurasiatheria

ArmadilloElephant

ChimpanzeeHuman

MousePika

RabbitHedgehog

Shrew

CatDog

HorseVicuna

Bottlenose dolphinCow

Greater false vampire batGreater horseshoe batStraw-coloured fruit bat

Parnell’s moustached batLittle brown bat

Large !ying fox

Echolocating bats

Non-echolocating batsAtlantogenata

Greater false vampire batGreater horseshoe bat

Straw-coloured fruit bat

Parnell’s moustached batLittle brown bat

Large !ying fox

Non-echolocating bats

Greater false vampire bat

Greater horseshoe bat

Straw-coloured fruit bat

Parnell’s moustached batLittle brown bat

Large !ying fox

Bottlenose dolphin

0

25

50

–0.06 –0.04 –0.02 0.00 0.02 0.04 0.06 0.08 0.10 –0.2 –0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6

75

500

1,000

1,500

!SSLS (H1) !SSLS (H2)

a

b

All other mammal lineages

All other mammal lineages

Support for H1 tree Support for H2 tree

Prestin**Dfnb59**

Slc44a2*

n = 2,326 loci n = 2,326 loci

0

5

10

15

20

100

200

300

400

500

Prestin*

Pcdh15*Ddx1**

Jmjd6*

Rho*

Six6*Tmc1**

Opa1*

Figure 1 | Convergence hypotheses and genomic distribution of support.a, For each locus, the goodness-of-fit of three separate phylogenetic hypotheseswas considered: (left) H0, the accepted species phylogeny based on recentfindings (for example, refs 14, 23–25); (top-right panel) H1, or ‘bat–batconvergence’, inwhich echolocating bat lineages (shown in brown) are forced toform a monophyletic group to the exclusion of non-echolocating Old Worldfruit bats (shown in orange); and (bottom-right panel) H2, or ‘bat–dolphinconvergence’, inwhich the echolocatingbat lineages and thedolphin (blue) forma monophyletic group to the exclusion of all non-echolocating mammals. SeeMethods for details of model fitting and topologies. b, The distribution of

convergence signal across 2,326 loci in 14–22 representativemammalian taxa, asmeasured by locus-wise mean site-specific likelihood support for the speciestopology (H0) over (left) the ‘bat–bat’ hypothesis uniting echolocating bats (thatis, DSSLS (H1)) and (right) bat–dolphin hypothesis (that is, DSSLS (H2)).Representative hearing and vision loci are shown in green and blue, respectively;for each locus significance levels based on simulation denote whether it hadsignificant counts of convergent sites after correcting for expected counts inrandom (control) phylogenies (*), and additionally whether strength of positiveselection (dN/dS) and convergence (DSSLS) at sites under selection inecholocators were correlated (**); see Supplementary Table 4 and Methods.

LETTER RESEARCH

1 0 O C T O B E R 2 0 1 3 | V O L 5 0 2 | N A T U R E | 2 2 9

Macmillan Publishers Limited. All rights reserved©2013

Page 16: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Combined methods

Page 17: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Trees and sites methods

Correlate selection (dN/dS) and incongruence (∆SSLS) signals

Page 18: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Genomic approaches

•  Pool information across sites •  Orthology, paralogy

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

Page 19: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Genomic approaches

•  Pool information across sites •  Orthology, paralogy

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

Page 20: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Genomic approaches

•  Pool information across sites •  Orthology, paralogy

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

Page 21: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Genomic approaches

•  Pool information across sites •  Orthology, paralogy

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

Page 22: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Genomic approaches

•  Pool information across sites •  Orthology, paralogy

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

Page 23: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Genomic approaches

•  Pool information across sites •  Orthology, paralogy

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

-1.0 -0.5 0.0 0.5 1.0

010

020

030

040

050

0

Distribution of genomic convergence, various hypothesesDistribution of genomic convergence,

Mean locus sitewise-specific likelihood support for H0; !SSLS (H0 - Ha)Support H0 (species phylogeny)Support Ha (alternative phylogeny)

Page 24: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Interpretation

•  Notional convergence detected across genome, or not at all

•  Relative measure

•  Strength-of-evidence

Page 25: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Sampling

Page 26: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Which Trees?

Page 27: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Which Trees?

•  Choice of hypothesis, subtly different from usual practice

•  If we accept tree space distance important…

•  … Hypotheses are parameters •  Ennumerate over trees?

Page 28: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Future

Page 29: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

On the horizon

•  Models: – Null model – Alternative / convergent model

•  Phylogeny methods: – Ennumerated / unrestricted phylogenies – Tree space ‘distance’

Page 30: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Conclusion

•  Strong evidence molecular convergence, or something like our best definition of it, is a pervasive force

•  Very early work; e.g. early attempts to estimate ω, and current dN/dS tests

Page 31: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Thanks Georgia Tsagkogeorga1 Kalina Davies1, James Cotton2, Elia Stupka3 & Steve Rossiter1

1School of Biological and Chemical Sciences, Queen Mary, University of London 2Wellcome Trust Sanger Institute

3Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan

Chris Walker & Dan Traynor Queen Mary GridPP High-throughput Cluster

Chaz Mein & Anna Terry Barts and The London Genome Centre

Mahesh Pancholi, Seb Bailey, Xiuguang Mao & Chris Faulkes School of Biological and Chemical Sciences

European Research Council; BBSRC (UK); Queen Mary, University of London

(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey

Page 32: Phylogenomic Convergence Detection - Evolutionary Biology Meeting in Marseille 2014

Further information References 1.  Zhang & Kumar (1997) MBE 14:527 2.  Li et al. (2008) PNAS 105(37):13959 3.  Castoe et al. (2009) PNAS 106(22):8986 4.  Liu et al. (2010) Curr. Biol. 20:R53 5.  Parker et al. (2013) Nature 502:228 6.  Liu et al. (2014) MBE 31(9):2415

Resources –  Lab: evolve.sbcs.qmul.ac.uk/rossiter –  SVN: bit.ly/1m96pXM –  email: [email protected]