separating population structure from recent evolutionary history

65
Separating Population Structure from Recent Evolutionary History

Upload: amadis

Post on 14-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Separating Population Structure from Recent Evolutionary History. 1. f st . 4N ev  . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Separating Population Structure from Recent Evolutionary History

Separating Population Structure

from Recent Evolutionary History

Page 2: Separating Population Structure from Recent Evolutionary History

Problem: Spatial Patterns Inferred Earlier Represent An Equilibrium Between Recurrent Evolutionary Forces Such as Gene Flow and Drift.

E.g.,

But, Can Obtain The Same Pattern Due to Recent Historical Events That Have Not Had Time to Reach Equilibrium

fst 1

4Nev

Page 3: Separating Population Structure from Recent Evolutionary History

To Examine Historical Events & Non-Equilibrium States, Need to Study Genetic Variation in Both

Space & Time

Directly Sample Populations From the Past Reconstruct Variation Through Time

Indirectly

Page 4: Separating Population Structure from Recent Evolutionary History

Direct Study: mtDNA in the Woolly Mammoth

Debruyne et al. 2008. Out of America: Ancient DNA Evidence for a New World Origin of Late Quaternary Woolly Mammoths. Curr. Biol. 18:1320-1326.

Page 5: Separating Population Structure from Recent Evolutionary History

Direct Study: mtDNA in the Woolly Mammoth

Debruyne et al. 2008. Out of America: Ancient DNA Evidence for a New World Origin of Late Quaternary Woolly Mammoths. Curr. Biol. 18:1320-1326.

Page 6: Separating Population Structure from Recent Evolutionary History

Indirect Studies

Recall that Dt=D0(1-r)t

Therefore, Multi-locus or Multi-site Polymorphic Data Contains Historical Information, and This Retention Is For Long Periods of Time When r Is Small.

Attempts to Reconstruct History Depend Upon Multiple Loci or Upon Multi-Site Haplotypes.

Page 7: Separating Population Structure from Recent Evolutionary History

Multiple Loci: Principle Component Analysis of Genetic Data

This procedure has long been used in human genetics to extract multi-locus information about gene flow patterns (e.g., Cavalli-Sforza & Ammerman, 1984).

Page 8: Separating Population Structure from Recent Evolutionary History

Multiple Loci: Principle Component Analysis of Genetic Data

Novembre et al. Nature 31 Aug 2008. Based on 197,146 loci in 1,387 individuals.

Page 9: Separating Population Structure from Recent Evolutionary History

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Overlay of the steepest slope values (upper 5%)

Microsatelite survey of naked mole rats in Meru National Park, Kenya (Jon Hess)

Page 10: Separating Population Structure from Recent Evolutionary History

Haplotypes

One Method Is To Look At the Spatial Distribution of Globally Rare, Tip Haplotypes (Although They May be Locally Common)

Coalescent Theory Implies Such Haplotypes Are Recent, And Therefore Are Not In Equilibrium And Have Limited Spatial Distributions

Therefore, Globally Rare, Tip Haplotypes Provide A Straightforward Method of Observing The Movements of Genes Through Space Over Short and Recent Time Periods.

Page 11: Separating Population Structure from Recent Evolutionary History

Schroeder, K. B. et al. Mol Biol Evol 2009 26:995-1016

Geographic distribution of the Asian and American populations genotyped for the microsatellite D9S1120

“Private” 9-repeat allele

Page 12: Separating Population Structure from Recent Evolutionary History

Schroeder, K. B. et al. Mol Biol Evol 2009 26:995-1016

Visual genotypes, clustered by population, for individuals either homozygous or heterozygous for the 9-repeat allele

Implies that this “private allele” is identical by descent in all Western Beringians and Native Americans, which in turn implies that Native Americans Descended (at least in part) From These Western Beringian Populations.

Page 13: Separating Population Structure from Recent Evolutionary History

Method for estimating the TMRCA of copies of an allele from the number of recombination events on its shared

haplotypic background

Page 14: Separating Population Structure from Recent Evolutionary History

Under the different best models, the mean TMRCA of the 9-repeat

allele ranged from 293 generations to 1,596 generations; using a generation time of 25 years resulted in a TMRCA of 7,325-39,900

years ago. Averaging over all of our best models, the mean TMRCA is 513 generations ago or about 12,825 years ago. The

95% confidence intervals for all of the best models produced ages for the MRCA of the 9-repeat allele, that range from 144 to 1951

generations ago, or approximately 3,600-48,775 years ago.

Schematics of the demographic models used for the coalescent simulations: (A) population split with two equal-size descendant populations (Asia and America), (B) population split with NAs/NAm equal to 0.15 at TAs/Am, and (C) population

split with NAs/NAm equal to 0.02 at TAs/Am, followed by population growth such that NAs/NAm equals 0.15 at T0. Models D and E are the same as models B and

C, respectively, but include population substructure in Asia and in America.

Page 15: Separating Population Structure from Recent Evolutionary History

Haplotype Trees

Are Biologically Meaningful Only When Recombination Is Absent Or Rare

Gives Some Information About Temporal Ordering of Mutational Variation, Both the Rare and the Common Mutations

Not Limited to Recent Events, But Can Go Back Further In Time (But Not Beyond the Most Recent Common Ancestral DNA Molecule)

Page 16: Separating Population Structure from Recent Evolutionary History

A Haplotype Tree Should Never Be Equated To A Tree of

Populations. It Is Only The Tree of The Genetic Variation For

That DNA Region.There Is Information About Population History in the

Haplotype Tree, But It Must Be Extracted Carefully.

Page 17: Separating Population Structure from Recent Evolutionary History

Haplotype Trees ≠Species or Population Trees

Page 18: Separating Population Structure from Recent Evolutionary History

It is dangerous to equate a haplotype tree to a species tree.

It is NEVER justified to equate a haplotype tree to a tree of populations within a species because the problem of lineage sorting is greater and the

time between events is shorter. Moreover, a population tree need

not exist at all.

Page 19: Separating Population Structure from Recent Evolutionary History

Nested Clade Analysis Converts Haplotype Trees Into A Nested

Statistical Design Other Data (Phenotypic or Geographical) Are

Then Overlaid Upon The Nested Design Statistical Tests Are Performed To Detect

Significant Associations Between the Data and The Haplotype Tree

DOES NOT EQUATE THE HAPLOTYPE TREE TO A POPULATION TREE!

Page 20: Separating Population Structure from Recent Evolutionary History

NCPA Distance Measures

= Sample locations

Page 21: Separating Population Structure from Recent Evolutionary History

A Haplotype Tree In Elephants

TsavoAmboseli

Sengwa

Hwange

Victoria Falls

Matetsi

Page 22: Separating Population Structure from Recent Evolutionary History

Within 1-Step Clades Within Tota l Tree

Haplotypes No. in

sample

Dc Dn 1-Step Clades Dc Dn

1 35 1021L*** 1027L***

2 20 81S*** 657S***

3 1 0 601 1-1 884 1173L***

Old-Young 944L*** 373L***

4 11 959L*** 832L***

5 16 114 249S*

6 3 0 156S* 1-2 460S*** 768S***

Old-Young 862L*** 598L***

7 27 47 47

8 1 0 126

9 1 0 68 1-3 49S*** 759S**

Old-Young 47 -50 626L*** 409L***

Page 23: Separating Population Structure from Recent Evolutionary History

Only When Statistical Significance Is Achieved Is The Biological Significance Interpreted With

Explicit, a priori Criteria

•For Example, Under Isolation By Distance, It Takes Many Generations For A New Haplotype To Spread Across Many Demes.•Therefore, Expect Older Haplotypes To Be More Widespread Than Younger Haplotypes•Younger Haplotypes Tend To Have Geographical Ranges Nested Within the Ranges of Their Ancestral Haplotypes

Page 24: Separating Population Structure from Recent Evolutionary History

A Haplotype Tree In Elephants

TsavoAmboseli

Sengwa

Hwange

Victoria Falls

Matetsi

Gene flow with IBD

Gene flow with IBD

Gene flow with IBD

Gene flow with IBD

Page 25: Separating Population Structure from Recent Evolutionary History

Historical Events Also LeaveLasting Patterns in Haplotype Trees.

For Example, When A Population Expands Into a New Area, Even Haplotypes Recently Created by

Mutation Can Become Geographically Widespread, and Haplotypes Created By

Mutation After the Expansion Can Be Located Far From the Geographical Center of Their Ancestral Haplotype.

Page 26: Separating Population Structure from Recent Evolutionary History

Range Expansion

Present

Past

Area A Area B Area C

Page 27: Separating Population Structure from Recent Evolutionary History
Page 28: Separating Population Structure from Recent Evolutionary History

Nested Clade Analysis of the Chub (Leuciscus cephalus): Range Expansion (from Durand et al. 1999)

Older Clade

YoungerClade

2-1

SPE

Page 29: Separating Population Structure from Recent Evolutionary History

Historical Events Also LeaveLasting Patterns in Haplotype Trees.

For Example, When A Population Is Fragmented or Otherwise Effectively Isolated, Haplotypes That Arise After

The Fragmentation/Isolation Event Cannot Spread to Other Geographical

Areas, and With Increasing Time, More Mutations Can Accumulate, Resulting In

Larger Than Average Branch Lengths Between Clades in Different Isolates.

Page 30: Separating Population Structure from Recent Evolutionary History

FragmentationRecent Old

Area A Area B Area C

Area A Area B Area C

Page 31: Separating Population Structure from Recent Evolutionary History

Fragmentation between Ambystoma tigrinum tigrinum (Clade 4-2) and A. t. mavortium (Clade 4-1)

Page 32: Separating Population Structure from Recent Evolutionary History

The Nested Design Means That Inferences Are Robust To Topological Variation

Induced by the Evolutionary Stochasticity of the Coalescent Process

African Elephants(Roca, A. L., N. Georgiadis, and S. J. O'Brien. 2005. Cytonuclear genomic dissociation in African elephant species. 37:96-100.

Savanna ElephantForest Elephant

Page 33: Separating Population Structure from Recent Evolutionary History

Fragmentation Inferences From NCA

All 5 DNA regions had a different topology with respect to the 3 elephant taxa (only BGN gave the “species tree”); yet NCPA inferred a fragmentation event between forest and savanna elephants in all 5 DNA regions.

Highly Significant Fragmentation Events Found In All Five Haplotype Trees

Past Fragmentation

Past Fragmentation Followed By Range Expansion and Secondary Contact

Y-DNAmtDNA

BGN PLP

PHKA2

Page 34: Separating Population Structure from Recent Evolutionary History

Nested Clade Phylogeographic Analysis

Recurrent Gene Flow, Range Expansion and Fragmentation Could All Have Occurred at Different Times and/or Places.

NCPA Therefore Looks For Multiple Patterns, Not Just One

The Relative Temporal Ordering of Events in a Nested Series of Clades Is Also Inferred by NCPA

Page 35: Separating Population Structure from Recent Evolutionary History

Inferences from mtDNA haplotype tree of Ambystoma tigrinum from NCPA and supplemental test for

secondary contact (Mol. Ecol. 10: 779-791, 2001)

Fragmentation

Secondary ContactRange Expansion

Range Expansion

Isolation by DistanceIsolation by Distance

Page 36: Separating Population Structure from Recent Evolutionary History

By Analyzing Haplotype Trees for mtDNA, Y-DNA, X-linked DNA and Autosomal DNA, One Can Sample A Wide

Variety of Time Scales and Both Male and Female

Mediated Gene Flow and Historical Events

Page 37: Separating Population Structure from Recent Evolutionary History

By Analyzing Multiple Haplotype Trees Can

Statistically Correct For The Evolutionary Stochasticity of The Coalescent Process For Any One Genomic Region

Page 38: Separating Population Structure from Recent Evolutionary History

Inference Errors in Nested Clade Analysis

These errors can be minimized by studying multiple loci and requiring each inference (type, place and time) to be cross-

validated by two or more loci.

Inference Requires That An Appropriate Mutation Occurred At the Right Time and Right Place: Therefore, Some Events and Processes Are Missed With A Particular DNA Region.

Selection and Evolutionary Stochasticity Can Distort The Distribution of Haplotypes in Space and Time, Thereby Leading to False Positive Inferences.

Page 39: Separating Population Structure from Recent Evolutionary History

Multilocus Nested Clade Analysis Perform Single Locus NCPA on n loci Discard any inferences made only by a single locus Group together all the inferences made by 2 or more loci that are

concordant by type of inference and geographical location. Test the null hypothesis that all inferences of an event that are concordant

by event type and location are a single event. Because gene flow is a recurrent process, inferences of gene flow between

two regions are not necessarily concordant in time, but can test the null hypothesis that there was no gene flow between two regions in an interval of time, say t1 to t2 given multiple inferences of gene flow between the two regions.

ALL RETAINED INFERENCES HAVE BEEN CROSS-VALIDATED ACROSS LOCI AND HAVE EXPLICIT, QUANTIFIED STATISTICAL SUPPORT.

Page 40: Separating Population Structure from Recent Evolutionary History

Using Theory Developed by Tajima (1983) and Kimura (1970), The

Distribution Of The Inference Time Is:

where ki is the average pairwise nucleotide diversity among the haplotypes in DNA region i in the youngest monophyletic clade that contributed in a statistically significant fashion to the NCPA inference of interest, and Ti is the age obtained by the Takahata et al. molecular clock estimator (or perhaps some other method) for this inference from DNA region i.

Page 41: Separating Population Structure from Recent Evolutionary History

Estimated Times To Common Ancestor (Method of Takahata et al. 2001)

Dh Nuc.Diff.Within Humans

Dhc Nuc.Diff.Between Humans

& Chimps

6 Million Years Ago

TMRCA = 12Dh/Dhc

Page 42: Separating Population Structure from Recent Evolutionary History

A Likelihood Ratio Test of The Hypothesis That The Estimated Times of An Event From j Loci Are The Same

Page 43: Separating Population Structure from Recent Evolutionary History

Highly Significant Fragmentation Events Found In All Five Haplotype Trees

Past Fragmentation

Past Fragmentation Followed By Range Expansion and Secondary Contact

Fragmentation Inferences From NCANull Hypothesis: there was a single fragmentation event between forest and savanna elephants.

log-likelihood ratio test = 1.497 with 4 degrees of freedom, p= 0.8272. Accept Null Hypothesis, with T = 4.2 MYA.

There are at least 2 lineages of African Elephants.

Y-DNAmtDNA

BGN PLP

PHKA2

Page 44: Separating Population Structure from Recent Evolutionary History

Performed Nested Clade Analyses on 25 DNA Regions in Humans

• Mitochondrial DNA (Ingman et al. Nature 408, 708 - 713, 2000: Sykes

et al. American Journal of Human Genetics 57, 1463-1475, 1995; Torroni et al. American Journal of Human Genetics 53, 563-590, 1993, American Journal of Human Genetics 53, 591-608, 1993).

• Y-DNA (Hammer et al. Molecular Biology and Evolution 15, 427-441, 1998)

• 11 X-Linked Regions (Balciuniene et al. 2001; Garrigan et al. 2005;

Hammer et al. 2004; Harris. & Hey, 1999, 2001; Kaessmann et al. 1999; Nachman et al. 2004; Saunders et al. 2002; Verrelli et al. 2002; Yu et al. 2002)

• 12 Autosomal Genes (Bamshad et al. 2002, Harding et al. 1997; Hollox

et al. 2001; Jin et al. 1999; Koda et al. 2001; Rana et al. 1999; Rogers et al. 2000; Toomajian and Kreitman 2002; Wooding et al. 2002; Zhang & Rosenberg 2000).

Page 45: Separating Population Structure from Recent Evolutionary History

The log likelihood ratio test rejects the null hypothesis that all 15 events are temporally concordant with a probability value of 3.89 10-15.

P = 0.95

P = 0.51

P = 0.62

Three Out-of-Africa Events, All DefinedBy Three or More Loci With A High

Degree of Temporal HomogeneityBut With Highly Significant

Heterogeneity BetweenThe Three Events

Page 46: Separating Population Structure from Recent Evolutionary History

There Were At Least Three Out-of-Africa Expansion Events Over the Last 2 Million Years

Page 47: Separating Population Structure from Recent Evolutionary History

Inferences of Gene Flow That Are Concordant Geographically Are NOT Necessarily Concordant Temporally Because Gene Flow is a Recurrent

Process. However, We Can Test The Null Hypothesis of NO GENE FLOW Between Two Geographical Regions

Over a Specified Time Interval.

Page 48: Separating Population Structure from Recent Evolutionary History

Test Of The Null Hypothesis of NO GENE FLOW Between Two

Geographical Regions Over a Specified Time Interval l to u:

[l ,u ]=1 ti

ki exp ti (1 ki ) / Ti

Ti / (1 ki ) 1 ki (1 k

i)l

u

dti

LRT ([l,u])=-2 ln [l ,u ]i=1

j

Page 49: Separating Population Structure from Recent Evolutionary History

Gamma Distributions For 19 African/Eurasian Gene Flow Inferences

With Isolation By Distance

Extensive overlap implies cross-validationwith the exception of MX1, the only locuswith most of its probability mass in the Pliocene.

The lack of clusters implies therewas no prolonged breaks in geneflow throughout the Pleistocene

Page 50: Separating Population Structure from Recent Evolutionary History

Testing The Null Hypothesis of No African/Eurasian Gene Flow Throughout

the Pleistocene

The Null hypothesis of isolation (no gene

flow) in this time interval is rejected

with p < 10-8

Page 51: Separating Population Structure from Recent Evolutionary History

All of The Cross Validated Inferences

Integrate Well Into A Single

Overview of The Emergence of

Humans.

Page 52: Separating Population Structure from Recent Evolutionary History

Coalescent SimulationsSet of Fully Specified

Phylogeographic Hypotheses

Simulate Coal.Process Many TimesUnder Each Hypothesis

Virtual Current Generation

Draw Simulated Sample of Same Size as Real Sample

Statistics on Simulated Sample

Real Current Generation

Statistics from Real Sample

Compare Relative Fits of The Simulated Statistics Under Each Model to The Observed Statistics

Page 53: Separating Population Structure from Recent Evolutionary History

Strong Vs. Weak Inference Falsification is the strongest inference possible in science, so this

is called “strong inference.” Inference in NCPA is based upon the falsification of null

hypotheses. Weak inference refers to the relative fit of a non-exhaustive set

of alternatives. It is rare that an exhaustive set of every conceivable

phylogeographic alternative can be simulated, so the coalescent simulation approach results in weak inference.

Weak inference can give high relative support to a false hypothesis when all the alternatives are also false.

Page 54: Separating Population Structure from Recent Evolutionary History

E.g, Fagundes et al (PNAS 104:17614-17619, 2007)

Tested 3 Models of Human Evolution via Simulation

Templeton (Yearbook of Physical Anthropology

48:33-59, 2005) Falsified All Three Models, With AFREG

Rejected with p < 10-17

These Results Are NOT Contradictory!

Page 55: Separating Population Structure from Recent Evolutionary History

E.g, Fagundes et al (PNAS 104:17614-17619, 2007)

Tested 3 Models of Human Evolution via Simulation

Eswaran et al (J. Human Evol. 49:1-18, 2005) Tested

AFREG vs. A model of Isolation By Distance and

Strongly Rejected AFREG.

These Results Are NOT Contradictory!Africa S. Europe S. Asia

Africa S. Europe N. Europe S. Asia N. Asia Pacific Americas

Page 56: Separating Population Structure from Recent Evolutionary History

Interpretive Criteria• Simulations assign “probabilities” to complex models as a

whole, making it impossible to interpret the biological reason for a low probability.

• In contrast, NCPA allows individual components to be tested, making the biological interpretation clear.

Reject the Null hypothesis of no admixture with p < 10-17

Page 57: Separating Population Structure from Recent Evolutionary History

Interpretive Criteria

The Null hypothesis of isolation (no gene flow) in the minimal time interval proposed by Fagundes et al is rejected with p = 1.6 X 10-6 by testing with multilocus NCPA.

Page 58: Separating Population Structure from Recent Evolutionary History

Interpretive Criteria• Although Fagundes et al. (2007) interpreted the rejection of their assimilation

model as a rejection of admixture, the confounded nature of simulation inference means that such an interpretation has no logical validity.

• NCPA allows individual components to be tested, making it clear that the part of their assimilation model that is wrong is NOT admixture, but rather the assumption of prior isolation of archaic Africans and Eurasians.

X

Page 59: Separating Population Structure from Recent Evolutionary History

Coherent Inference• Coherence is a property referring to nested and

composite hypotheses.

• The meaning of coherence is most easily illustrated with nested hypotheses:

B A

One measure of fit is the probability of the hypotheses. Because A is a nested subset of B, Prob.(B) ≥ Prob.(A). This relationship is “coherent”.

If one assigned Prob.(A) > Prob.(B), this is mathematically impossible and is said to be “incoherent”.

Page 60: Separating Population Structure from Recent Evolutionary History

E.g, Fagundes et al (PNAS 104:17614-17619, 2007)

The “assimilation” model (B) allows the possibility of admixture between Africans and Eurasians, measured by the parameter M that can vary between 0 and 1. Note, M=0 corresponds to replacement, so the replacement model (A) is a proper subset of the assimilation model.

Note the probabilities assigned to A and B.

The ABC method is INCOHERENT!

Page 61: Separating Population Structure from Recent Evolutionary History

Why Is ABC INCOHERENT?

There is no correction for dimensionality of the different hypotheses (indexed by i); and

The denominator treats all hypotheses as mutually exclusive events.

Equation 9 From Beaumont, M. A., W. Y. Zhang, and D. J. Balding. 2002. Approximate Bayesian computation in population genetics. Genetics 162:2025-2035.

Page 62: Separating Population Structure from Recent Evolutionary History

E.g, Fagundes et al (PNAS 104:17614-17619, 2007)

Equation 9 states that the

Prob(A or B or C) = P(A)+P(B)+P(C)

A B C

CB A

Prob(A or B or C) = P(B)+P(C) - P(B & C)

Hence, the fundamental equation of ABC is

mathematically incoherent for nested and/or composite

hypotheses.

Page 63: Separating Population Structure from Recent Evolutionary History

Other Methods of Evaluating Hypotheses in the Coalescent Simulation Approach are Incoherent

•Bayes Factors are known to be incoherent (Lavine, M., and M. J. Schervish. 1999. Bayes Factors: What They Are and What They Are Not. The American Statistician 53:119-122).

•Mesquite and all other programs that treat all phylogeographic hypotheses as mutually exclusive alternatives are incoherent.

•Coalescent Simulations Can Only Be Used to Test Single Parameter Models Against Their Complement (e.g., FST > 0 vs. FST = 0).

Page 64: Separating Population Structure from Recent Evolutionary History

Statistical Phylogeography

Statistical Phylogeography

Multilocus NCPA provides a robust, flexible testing framework.

Simulations have multiple statistical flaws and cannot be used to test composite

phylogeographic hypotheses.NCPA defines the general model but does not

yield insight into details.Once the general model framework has been inferred by NCPA, simulations can be used to

estimate the underlying parameters.

Multilocus NCPA provides a robust, flexible testing framework.

Simulations have multiple statistical flaws and cannot be used to test composite

phylogeographic hypotheses.NCPA defines the general model but does not

yield insight into details.Once the general model framework has been inferred by NCPA, simulations can be used to

estimate the underlying parameters.

Page 65: Separating Population Structure from Recent Evolutionary History

Statistical Phylogeography

Statistical Phylogeography

NCPA and simulation approaches are not so much alternative

techniques as they are complementary, and potentially

synergistic, techniques. Both add to the statistical toolkit of

intraspecific phylogeographers, and both should be used when

appropriate.

NCPA and simulation approaches are not so much alternative

techniques as they are complementary, and potentially

synergistic, techniques. Both add to the statistical toolkit of

intraspecific phylogeographers, and both should be used when

appropriate.