population genetics and molecular evolution · 2019-07-22 · bachelor’sdegree in bioinformatics...

17
Prof. Antonio Barbadilla Bachelor’s Degree in Bioinformatics Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference 1 Course 2017-18 Session 5. Evolutionary Inference Methods for the detection of selection at the DNA level Test of neutrality Antonio Barbadilla Group Genomics, Bioinformatics & Evolution Institut Biotecnologia I Biomedicina Departament de Genètica i Microbiologia UAB Population Genetics and Molecular Evolution

Upload: others

Post on 22-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

1

Course 2017-18

Session 5.

Evolutionary Inference

Methods for the detection of selection at the DNA level

Test of neutrality

Antonio Barbadilla

Group Genomics, Bioinformatics & Evolution

Institut Biotecnologia I Biomedicina

Departament de Genètica i Microbiologia

UAB

Population Genetics and Molecular Evolution

Page 2: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

2

Page 3: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

3Casillas, S. and A. Barbadilla. 2017. Molecular Population Genetics. Genetics 205: 1003–1035

Page 4: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

4Casillas, S. and A. Barbadilla. 2017. Molecular Population Genetics. Genetics 205: 1003–1035

Page 5: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

5Casillas, S. and A. Barbadilla. 2017. Molecular Population Genetics. Genetics 205: 1003–1035

Page 6: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

Tajima’s D test (1989)

If segregating variants are neutral -> Tajima = Watterson

Deviation statistic Tajima’s D

In absence of recombination D follows a distribution with mean 0 and variance 1

• D 0 no neutral pattern

• D < 0 Purifying (negative) selectionor recent selective sweep

• D > 0 Balancing selection

In absence of recombination|D|>1.8 reject null hypothesis

with α = 0.05

Ѳw = (S/m) /σ𝑖=1𝑛−11/𝑖

Tajima = 1𝑛2𝑚σ𝑖=1𝑛−1σ𝑗=𝑖+1

𝑛 𝑘𝑖𝑗

𝑇𝑎𝑗𝑖𝑚𝑎′𝑠 𝐷 =𝜋−𝜃

𝑉𝑎𝑟(𝜋−𝜃)

Page 7: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

7

In absence of recombination|D|>1.8 reject null hypothesis

with α = 0.05

𝑇𝑎𝑗𝑖𝑚𝑎′𝑠 𝐷 =𝜋−𝜃

𝑉𝑎𝑟(𝜋−𝜃)

Page 8: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

Sequence FOXP2

Lai et al. (2000) Am. J. Hum. Genet. 67, 357-367Inheritance pattern autosomical dominant

Chromosome 77q31

Gene FOXP2

Orangutan Gorilla Chimpanzee Humans

Protein evolution?

000 2

Lai et al. (2001) A forkhead-domain gene is mutated in a severe speech and language disorder. Nature. 2001 Oct 4;413(6855):519-23

Page 9: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

Enard, W. et al. (2002) Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418, 869–872

Page 10: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

Tajima’s D test in FOXP2 gene

In absence of recombination|D|>1.8 reject null hypothesis

with α = 0.05

𝑇𝑎𝑗𝑖𝑚𝑎′𝑠 𝐷 =𝜋−𝜃

𝑉𝑎𝑟(𝜋−𝜃)= -2.20

• 20 individuals• Seq 14000 pb around the genes• 47 SNPs

𝜋 = 7.9𝑥10_4

𝜃 = 3.0𝑥10-3

𝜃 > 𝜋

Page 11: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

11

Population differentiationFst test (1989)

Elevation in FST at multiple SNPs in a 3.2-Mb region around the LCT gene. Sample Europeans, Africans and Asians. Percentile is computed for all SNPs in the genome

Bersaglieri, Todd et al. 2004. Genetic Signatures of Strong Recent Positive Selection at the Lactase Gene. The American Journal of Human Genetics , Volume 74 , Issue 6 , 1111 - 1120

LCT gene -> enzyme lactase

lactose -> glucose + galactose

Page 12: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

Population Diversity

(polymorphism)

Selective sweep (footprint) on neutral variationlinked to a beneficial mutation

Vitti JJ, Grossman SR, Sabeti PC 2013 Detecting natural selection in genomic data. Annual review of genetics 2013 47: 97-120

Page 13: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 1. Introduction to Population Genetics

13

Test* Compares References

Based on Linkage Disequilibrium:

EHH Measurement of the decay of the association

between alleles at various distances from a locus

Sabeti et al. (2002)

LHR Test to search alleles of high frequency with long-

range linkage disequilibrium

Sabeti et al. (2002)

iHS Test to search for alleles under positive selection

between shared haplotypes

Voight et al. (2006)

EEH, Extended Haplotype Homozygosity; LHR, Long Haplotype Range; iHS, Integrated Haplotype Score; XP, Cross Population; CLR, Composite Likelihood Ratio .

Selective tests based on linkage disequilibrium (LD)

Page 14: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

14

Test using LD and haplotype structure

Estimates of elevation of ρexcess at multiple SNPs in a 3.2-Mb region around the LCT gene. Sample Europeans, Africans and Asians. ρexcess is a measure of the increase on linkage disequlibrium. Percentile is computed for all SNPs in the genome

Bersaglieri, Todd et al. 2004. Genetic Signatures of Strong Recent Positive Selection at the Lactase Gene. The American Journal of Human Genetics , Volume 74 , Issue 6 , 1111 - 1120

LCT gene -> enzyme lactase

lactose -> glucose + galactose

Page 15: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary InferenceRecent human adaptation

Going global by adapting local: A review of recent human adaptation. 2016. Shaohua Fan,1* Matthew E. B. Hansen,1* Yancy Lo,1,2* Sarah A.

Tishkoff1,3 Science. 2016 Oct 7; 354(6308): 54–59.

Page 16: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

Exercise: Estimate Tajima’s D values. Suggest potential selective scenarios for theestimated values

Sequence data set 1 Sequence data set 2

A G C G T T C T G C T C GA G A G T T C T G C T C GA G C T T T A T G C T C GA G A G T T C T G C T C GA G A G T T A T G C T C GA G C G T T C T G C T C GA G C T T T A T G C T C GA G C G T T A T G C T C GA G A T T T A T G C T C GA G A G T T A T G C T C GA G A G T T C T G C T C GA G C T T T A T G C T C GA G A G T T C T G C T C GA G C T T T A T G C T C G

1.

2.

3.

4.

5.

6

7.

8.

9.

10.

11.

12.

13.

14.

A G C G T T C T G C T C GA G C G T T C T G C T C GA G C G T T C T G C T C GT G C G T T C T G C T C GA G C G T T C T G C T C GA G C G T T C T G C T A GA G G G T T C T G C T C GA G C G T T C T G C T C GT G C G T T C T G C T C GA G C G T T A T G C T C GA G C G T T C T G C T C GA G C G T T C T G C T C GA G C G G T C T G C C C GA G C G T T C T G C T C G

1.

2.

3.

4.

5.

6

7.

8.

9.

10.

11.

12.

13.

14.

Exercises

Page 17: Population Genetics and Molecular Evolution · 2019-07-22 · Bachelor’sDegree in Bioinformatics Prof. Antonio Barbadilla Population Genetics and Molecular Evolution - Session 5

Prof. Antonio BarbadillaBachelor’s Degree in Bioinformatics

Population Genetics and Molecular Evolution - Session 5. Evolutionary Inference

18

Readings & VideosReadings

• Introduction to Genetics and Evolution - Coursera - Prof. Mohamed Noor

Videos / Online resources

• Casillas, S. and A. Barbadilla. 2017. Molecular Population Genetics. Genetics 205: 1003–1035. -> read pages 1012-1018

• Nielsen, R. and M. Slatkin 2013, Chapters 9

• Kliman, R. (2008) The EvolGenius Population Genetics Computer Simulation: How it Works. Nature Education 1(3):7

• Vitti JJ, Grossman SR, Sabeti PC 2013 Detecting natural selection in genomic data. Annual review of genetics 2013 47: 97-120

• Shaohua F, Matthew EB, Hansen YL, Tishkoff SA. 2016. Going global by adapting local: A review of recent human adaptation. 2016. Science. 2016 Oct 7; 354(6308): 54–59.