lounès chikhi evolution et diversité biologique cnrs université paul sabatier, toulouse
DESCRIPTION
Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse. The Neolithic transition in Europe: different views from population genetic (a tentative discussion around some methodological questions). Inference in population genetics. Data collection - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/1.jpg)
The Neolithic transition in Europe:
different views from population genetic
(a tentative discussion around
some methodological questions)
Lounès Chikhi
Evolution et Diversité Biologique
CNRS
Université Paul Sabatier, Toulouse
![Page 2: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/2.jpg)
Inference in population genetics
• Data collection
• Genetic typing
• Description of patterns of genetic variability
• Analysis and interpretation
• Test (simulations)
![Page 3: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/3.jpg)
• Sampling of “populations”
• “Choice” of the markers (genome sampling)• mitochondrial DNA : female demography• Y Chromosome : male demography• nuclear genes (markers: allozymes, microsatellites,
RFLP, AFLP, SNPs, etc.)
• Description of the patterns :– Diversity within samples– Diversity between samples– Are there spatial patterns ?
Inference in population genetics
![Page 4: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/4.jpg)
A similar pattern with Y chromosome data
Semino et al. (2000)
Science
What to do of the patterns ?
How to interpret them ?
![Page 5: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/5.jpg)
• Are the patterns, if any, compatible with hypotheses or demographic scenarios from other areas (archaeology, linguistics, etc.) ?
Inference in population genetics
![Page 6: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/6.jpg)
10,000 BP
45,000 BP
18,000 BP
A possible scheme of population movements since Paleolithic
![Page 7: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/7.jpg)
• Is there a link between these images (archeo-genetico-linguistic) ?
• Can we estimate demographic parameters ?– Population : stable ? growing ? bottleneck ?– Admixture between populations ?
• Can we date these events ?
• Can we detect selection ?
Inference in population genetics
![Page 8: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/8.jpg)
Effect of population size changes on some measures of genetic diversity
• nA drops quicker than He because rare alleles are eliminated and do not contribute to – He = 1-Σpi
2.
• gappy allelic size distributions
• range varies little (r=range)
Bottleneck
nA = 4r = 7nA/r = 0.57
He = 0.71
gaps
Allele sizes(nb of repetitions)
All
ele
fre q
uenc
y
nA = 7 r = 8nA/r = 0.88
He = 0.74
gap
All
ele
fre q
uenc
y
![Page 9: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/9.jpg)
• Thus there is some information in genetic data about ancient demographic events.
• However, this information, may be qualitative rather than quantitative and does not allow us to determine whether other scenarios could have played a role (or selection).
Inference in population genetics
![Page 10: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/10.jpg)
Recent data from the Y chromosome have been interpreted as indicating a Neolithic contribution of 22% (Semino et al., 2000).
This figure (22%) is the sum of the frequencies of 4 haplotypes called Eu4, 9, 10 and 11
Question : why should the proportion of haplotypes exhibiting a clinal distribution today represent the so-called “Neolithic” contribution?
![Page 11: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/11.jpg)
There are two problems with this “estimation”:
1. Clines are only expected for alleles that were present in different frequencies in the populations when they mixed (dilution problem).
Moreover, drift in the last 4000-8000 years may have blurred clines that were visible at the time.
Many haplotypes are observed only 1, 2 or 3 times in each sample (i.e. no cline is going to be as visible by eye as those observed for the 4 selected haplotypes)
2. Even if it were estimated properly it would be meaningless for understanding the processes of European colonization. A single number cannot summarise a cline.
![Page 12: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/12.jpg)
Average = 100(PN +PN2+…+
PNn)/n.
Same value of PN = 0.9
(90% farmers + 10% hunter-gatherers).
Horizontal lines are averages:
n=10: average = 62%
n=25: average = 36%
n=50: average = 21%
Thus, a lack of pattern or a low average can correspond to a high PN value.
PN = proportion of farmers in any admixed population
n= number of admixture events
Geometric decrease of Neolithic contrib. from PN to PNn
Ex: PN=0.9 and n=25, then PNn=0.07
+: n=10
Δ: n=25
O: n=50
![Page 13: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/13.jpg)
Two major models have been proposed
(or at least structure current debate)
The demic diffusion model: significant correlations between archaeological and genetic maps are explained by a movement of people entering Europe from the Levant and Anatolia during the Neolithic. We would expect a significant genetic contribution.
The cultural diffusion model: the spread of agriculture in Europe involved the movement of ideas, not of people. The genetic contribution of Near East farmers to the European gene pool should be limited.
Demic diffusion
Large Genetic contribution Small Genetic contribution
Cultural diffusion
Average
![Page 14: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/14.jpg)
What kind of inference ?
– Qualitative versus quantitative ?
– Detection versus estimation.
– Models and underlying assumptions.
Inférence en génétique des populations
![Page 15: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/15.jpg)
Hybrid(Europe)
Parent 2(Near East)
T
Present
Past
Parent 1(Basques)
Parent 2(farmers.)
p1 1 – p1Parent 1
(hunter-gatherers)
Hybrid
T/N1 T/Nh T/N2
Admixture model
Separates the effects of drift and admixture
![Page 16: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/16.jpg)
Evolution of allele frequencies under genetic drift
00,10,20,30,40,50,60,70,80,9
1 2 3 4 5 6 7 8
Nb. Generations
all
ele
fre
q.
P1
H
P2
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
1 2 3 4 5 6 7 8
Nb Generations
all
ele
fre
q.
P1
H
P2
The effect of drift is that the « hybrid » population may not even be intermediate after a limited number of generations.
In other words:
(i) the information on admixture decreases with time.
(ii) It is risky to analyse single locus data when demographic events are ancient.
![Page 17: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/17.jpg)
p1 = 0.3 Little drift
More drift
1) We simulate data according to the model (figure above) varying some parameters (here drift)
2) The outputs are given to the program implementing the method
3) One distribution is obtained for each simulation
![Page 18: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/18.jpg)
1) We note that for the VERY SAME scenario inference can be extremely different !
2) This inference varies from one locus to the other.
3) When two loci produce different estimates, we cannot conclude that they had a different demographic history.
4) Worse : we are in an optimal situation : we « know » the real p and the data were simulated according to the model. This NEVER happens in real life.
p1 = 0.3
![Page 19: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/19.jpg)
1) One solution : multi-locus data.
2) Increasing sample sizes is NOT very useful.
3) Better to have multilocus data much later than one locus juste after:
Ex: 5 loci after 100 generations versus 1 locus after 1 generation (for N=1000)
4) Don’t throw your allozyme data away.
![Page 20: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/20.jpg)
What if we re-analyse Semino et al.’s data ?
![Page 21: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/21.jpg)
Y chromosome data (Semino et al., 2000).
p1 represents the hunter-gatherer contribution (descendants = Basques)
Each curve corresponds to the analysis of a European population.
Significant cline observed for (1-p1) values (i.e. Near Eastern contribution) against geog. distance calculated from the Near East.
![Page 22: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/22.jpg)
After Semino et al., 2000
Science
![Page 23: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/23.jpg)
As a test we can analyse the same data considering Sardinians as descendants of the hunter-gatherers.
We find an extremely similar result.
The « Neolithic » contribution is even slightly superior: on the order of 65% instead of 50%.
![Page 24: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/24.jpg)
1) There are significant clines for the parameter representing the Neolithic contribution Néolithique across Europe.
2) This “trend” is signifcantly different from that “obtained” by Semino et al. (2000).
3) The Neolithic contribution appears to be around 50% rather than 22%.
4) Re-analysis of all European populations using the Sardinian population as P1 shows very similar results with higher Neolithic contribution (average of 65%).
Conclusion:The cultural diffusion model is unlikely to explain
the patterns observed using the Y chrom. data.
Model-based results (i)
(mostly on Y chromosome data)
![Page 25: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/25.jpg)
• Tests performed are partial and the model is simplistic but it is a first step towards quantification of demographic parameters clearly identified.
• Qualitative approach : – Easy and useful BUT little or misleadingly precise
• Quantitative approach:– Assumptions are explicit– Results can be precise (or not) BUT often complicated
to interpret and (maybe) model-dependent.
![Page 26: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/26.jpg)
Inference in population genetics
• Data collection
• Genetic typing
• Description of patterns of genetic variability
• Analysis and interpretation
• Test (simulations)
![Page 27: Lounès Chikhi Evolution et Diversité Biologique CNRS Université Paul Sabatier, Toulouse](https://reader035.vdocuments.net/reader035/viewer/2022062409/56814588550346895db26ad3/html5/thumbnails/27.jpg)
Inference in population genetics
In case I was not specific enough : Beware the use of any method whose assumptions
you do not understand or which have not been extensively tested on simulations :
– Nested Clade Analysis– Median-network
Thank you
AND MANY THANKS TO
Mark Beaumont, Mike Bruford,Guido Barbujani, Richard Nichols, etc.