correlating traits with phylogenies using bats. phylogeny and trait values a phylogeny describes a...

23
Correlating traits with phylogenies Using BaTS

Upload: kelton-odham

Post on 15-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Correlating traits with phylogenies

Using BaTS

Page 2: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Phylogeny and trait values

A phylogeny describes a hypothesis about the evolutionary relationship between individuals sampled from a population

Discrete character traits of interest can be mapped onto the phylogeny

A significant association between a particular trait value and its distribution on a phylogeny indicates a potential causative relationship

Page 3: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship

between individuals sampled from a population

Page 4: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Phylogeny and trait values Discrete character traits of interest can be mapped onto the phylogeny

Page 5: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Phylogeny and trait values A significant association between a particular trait value and its distribution

on a phylogeny indicates a potential causative relationship

Page 6: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Phylogeny and trait values Often, the phylogeny-trait relationship does not appear unequivocal by eye:

an analytical framework may be needed.

(clear association)

(no association)

????

Page 7: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Phylogeny and trait values

The null hypothesis

The null hypothesis under test is one of random phylogeny-trait association; that is, that

“No single tip bearing a given character trait is any more likely to share that trait with adjoining taxa than we would expect due to chance”

Page 8: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

An example Salemi et al (2005)*: Dataset of

HIV sequences sampled from CNS tissues post mortem

Analysis by Slatkin-Maddison (1989) method, reanalyzed in BaTS**.

Compartmentalization by tissue type: circulating viral populations defined by location in the body:

*Salemi et al. (2005) J. Virol 79(17): 11343-11352.**Parker, Rambaut & Pybus (2008) MEEGID 8(3):239-

246.

Statistic p-value (BaTS)

AI <0.01

PS <0.01

Frontal lobe <0.01

Occipital lobe <0.01

Meninges <0.01

Lymph nodes <0.01

Temporal lobe <0.01

Spinal cord <0.01

Page 9: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Available methods

Non-phylogenetic: ANOVA Ignores shared ancestry

Phylogenetic: Single tree mapping Slatkin-Maddison & AI BaTS

Page 10: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Methods: Single-tree mapping

Method: Map traits onto a tree Look for correlation

Pros: Fast Simple

Cons: No indication of significance Statistically weak (high Type II error) Conditional on a single topology

Page 11: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Methods: Slatkin-Maddison & AI

Method: Map traits onto a tree by parsimony & count migration

events (Slatkin-Maddison) or measure ‘association index’ within clades recursively (AI)

Compare observed value with a null (expected) value obtained by bootstrapping

Pros: Still reasonably fast Indication of significance

Cons: Still conditional on a single topology

Page 12: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Methods: BaTS

Method: See below(!)

Pros: Indication of significance Statistically powerful and Type I error is correct Accounts for phylogenetic uncertainty

Cons: Requires Bayesian MCMC sequence analysis Slower

Page 13: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

BaTS: under the bonnet

Use a posterior distribution of phylogenies from Bayesian MCMC analysis

Calculates migrations, AI and a variety of other measures of association

Both observed and expected (null) values’ posterior distributions sampled

Significance obtained by comparing observed vs. expected

Page 14: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

BaTS: analysis workflow

Preparation: Sequence alignment Bayesian MCMC phylogeny reconstruction

(BEAST, MrBAYES) to obtain posterior distribution of trees (PST)

Taxa in PST marked up with discrete traits BaTS analysis Interpretation

Page 15: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Workflow: Preparation (i)

Sequence alignment: CLUSTAL, BioEdit, SE-Al

Bayesian MCMC analysis: MRBAYES, BEAST

Taxa marked-up with traits

Page 16: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Workflow: Preparation (ii)

Taxa marked-up with traits:Typical NEXUS format:

Page 17: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Workflow: Preparation (iii)

Taxa marked-up with traits:

begin states; a) Declare ‘states’ block

b) Assign a trait to each taxon in the order that they appear in the original #NEXUS file

c) Close the ‘states’ block.

d) Omit ‘translate’ and ‘taxa’ blocks.

Page 18: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Workflow: BaTS analysis

To use BaTS from the command-line, type:

java –jar BaTS_beta_build2.jar [single|batch] <treefile_name> <reps> <states>

Where:

single or batch asks BaTS to analyse either a single input file, or a whole directory (batch analysis)

<treefile_name> is the name and full location of the treefile or directory to be analysed,

<reps> is the number (an integer > 1, typically 100 at least) of state randomizations to perform to yield a null distribution, and

<states> is the number of different states seen.

Page 19: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

C:\joeWork\apps\BaTS\BaTS_beta_build2\BaTS_beta_build2>java -jar BaTS_beta_build 2.jar single example.trees 100 7

Performing single analysis. File: example.trees Null replicates: 100 Maximum number of discrete character states: 7

analysing... 30 trees, with 7 states analysing observed (using obs state data) 30 29 30 29 30 29 30 29 Statistic observed mean lower 95% CI upper 95% CU null mean lower 95% CI upper 95% CI significance AI 1.5555052757263184 1.1128820180892944 2.160351037979126 12.03488540649414 11.475320040039 12.6391201928711 0.0 PS 18.5 17.0 20.0 80.7713394165039 77.86666870117188 83.56666564941406 0.0 MC (state 0) 12.633333206176758 9.0 16.0 1.7496669292449951 1.399999976158142 2.1666667461395264 0.009999990463256836 MC (state 1) 19.0 19.0 19.0 1.7480005025863647 1.33333337306976 32 2.0999999046325684 0.009999990463256836 MC (state 2) 12.666666984558105 12.0 13.0 1.77991247559 1.33333697632 2.200000047683716 0.009999990463256836 MC (state 3) 8.566666603088379 3.0 11.0 1.66733866943 1.2333333492279053 2.133333444595337 0.009999990463256836 MC (state 4) 11.0 11.0 11.0 1.5526663064956665 1.16666662693023 68 2.0999999046325684 0.009999990463256836 MC (state 5) 3.433333396911621 2.0 6.0 1.4840000867843628 1.100000023841858 2.0333333015441895 0.009999990463256836 MC (state 6) 5.066666603088379 5.0 6.0 1.2973339557647705 1.0333333015441895 1.600000023841858 0.009999990463256836 done

Done.

The analysis

30 trees were detected in the input file

Output: statstics, one per line, tabulated

The ‘MC…’ statistics are reported in the order in which they occur in the input file

(housekeeping and debugging messages)

Page 20: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Workflow: Interpretation

The null hypothesis

The null hypothesis under test is one of random phylogeny-trait association; that is, that

“No single tip bearing a given character trait is any more likely to share that trait with adjoining taxa than we would expect due to chance”

Page 21: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Workflow: Interpretation

The statistics: Larger values increased phylogeny-trait

association Significance indicated by p-value In addition, observed posterior values are

informative for some statistics: PS: indicates migration events between trait values MC(trait value): indicates number of taxon in largest clade

monophyletic for that trait value

Page 22: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

FAQs / common pitfalls

Java 1.5 or higher is required. See java.sun.com for more.

Large datasets can be slow, so down-sample input tree files (uniformly, not randomly) where necessary, or to check BaTS input files are marked-up correctly.

A RAM (memory) shortage can slow the analysis, use –Xmx switch to allocate virtual RAM*

Check input file mark-up carefully if in doubt.

*See more: http://edocs.bea.com/wls/docs70/perform/JVMTuning.html

Page 23: Correlating traits with phylogenies Using BaTS. Phylogeny and trait values A phylogeny describes a hypothesis about the evolutionary relationship between

Author contact:

Joe Parker

Department of Zoology

Oxford University, UK

OX1 3PS

[email protected]

http://evolve.zoo.ox.ac.uk