natural selection in humans sharareh noorbaloochi cs 374 oct 10, 2006

71

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006
Page 2: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Natural Selection in Humans

Sharareh NoorbaloochiCS 374Oct 10, 2006

Page 3: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Papers to be presented…

Science, 16 June 2006, Volume 312

PLoS Biology, March 2006, Volume 4

Page 4: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Overview

• Pursuit of natural selection• Biological Background• Methods for detecting positive

selection• Genome-wide studies• From candidate to function

Page 5: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Images from: Voight et al. 2006

Page 6: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Adaptability of Modern Humans

Humans have undergone tremendous cultural and environmental changes during the last ~40-50 KY.

• Spread around the world (migrate out of Africa 100KY)

• Global warming trend since last ice ageice age ~14 KYA

• Transition from hunter to agricultural society (<~10KYA)

• Increase in pathogen load due to greater population density and proximity to livestock

Voight et al. (2006)

Page 7: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Pursuit of Natural Selection

Page 8: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Some Facts• In human beings, 99.9 percent of bases are the same.

• Remaining 0.1 percent makes a person unique. – Different attributes / characteristics / traits

• how a person looks, • diseases he or she develops.

• These variations can be:– Harmless (change in phenotype)– Harmful (diabetes, cancer, heart disease, Huntington's

disease, and hemophilia )– Latent (variations found in coding and regulatory

regions, are not harmful on their own, and the change in each gene only becomes apparent under certain conditions e.g. susceptibility to lung cancer)

Page 9: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Human Genetic Variations

Two types of genetic mutation events for today:

• Single base mutation which substitutes one nucleotide for another

-- Single Nucleotide Polymorphisms (SNP)

• Insertion or deletion of one or more nucleotide(s)

--Tandem Repeat Polymorphisms --Insertion/Deletion Polymorphisms

• Structural variations also important (copy numbers)

• One of the Most common type of genetic variation

Page 10: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

What is SNP ?

• A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more than 1 percent) of a large population.

For example a SNP might change the DNA sequence

AAGGCTAA ATGGCTAA.

Page 11: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

SNP facts• SNPs are found in

– coding and (mostly) noncoding regions.

• Occur with a very high frequency– about 1 in 1200 bases on average. – approximately 10 million SNPs occur

commonly in the human genome.

Page 12: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Allele

• Allele: Any one of a number of viable DNA codings occupying a given locus (position) on a chromosome.

• Usually alleles are DNA sequences that code for a gene, but sometimes the term is used to refer to a non-gene sequence.

• In a diploid organism, like humans, one that has two copies of each chromosome, two alleles make up the individual's genotype.

Page 13: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Haplotype

• Haplotype is a set of SNPs on a single chromatid that are statistically associated.

Page 14: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

SNP Maps

• Sequence genomes of a large number of people

• Compare the base sequences to

discover SNPs.

• Generate a single map of the human genome containing all possible SNPs => SNP maps

Page 15: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

SNP Maps

Page 16: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

The HapMap Project

• The DNA samples for the HapMap come from a total of 270 people: 1. Yoruba people in Ibadan, Nigeria (30 both-

parent-and-adult-child trios), 2. Japanese in Tokyo (45 unrelated individuals), 3. Han Chinese in Beijing (45 unrelated

individuals), 4. CEPH (European) (30 trios).

• These numbers of samples will allow the Project to find almost all haplotypes with frequencies of 5% or higher.

• Ascertainment Bias (not enough samples to look at lower frequencies than 5%)

http://www.hapmap.org/index.html.en

Page 17: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Hapmap, SNPs, Haplotype, Tag SNPs

The construction of the HapMap occurs in three steps.

• (a) Single nucleotide polymorphisms (SNPs) are identified in DNA samples from multiple individuals.

• (b) Adjacent SNPs that are inherited together are compiled into "haplotypes."

• (c) "Tag" SNPs within haplotypes are identified that uniquely identify those haplotypes.

By genotyping the three tag SNPs shown in this figure, researchers can identify which of the four haplotypes shown here are present in each individual.

Page 18: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

SNPs may / may not alter protein structure

• Genetic variants that alter protein functions are usually deleterious less likely to become common or fixated.

• Synonymous: AKA silent mutation, are mutations that have no functional affect on the protein.

• Non-synonymous: amino acid-altering mutations sickle cell anemia

Synonymous Non- Synonymous

• Degeneracy of Genetic Code

Page 19: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

How does human history affect genetic variation?

A genome-wide survey of Linkage Disequilibrium

Linkage disequilibrium is a phenomenon whereby genetic variants are associated: people who have one variant tend to have a second variant as well.

Slide by: David Reich, Broad Institute

Page 20: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Variation Over time

Variations in Chromosomes Within a Population

Common Ancestor

Emergence of Variations Over Time

time present

Mutation

Slide by: David Reich, Broad Institute

Page 21: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Time = present

What Determines Extent of LD?

2,000 gens. ago

Mutation

1,000 gens. ago

Slide by: David Reich, Broad Institute

Recombination is the key!

Page 22: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Neutral Evolution Versus

Positive Natural Selection

Page 23: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Neutral Evolution

Generations

1

Reproduced from Sabeti et al.

2 3 4 5 6 7 8 9 10

Genetic Drift: slow processFrequency of the neutral mutations in the population changes randomly.

Page 24: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Positive Natural Selection

Generations

1

Reproduced from Sabeti et al.

2 3 4 5 6 7 8 9 10

Positive Selection:A selective regime that favors the fixation of an allele that increases the fitness of its carrier.

Fixation: The process by which one allele increases in a population until all other alleles go extinct and the locus becomes monomorphic.

Simply: 100% frequency.

Page 25: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Methods for detecting selection• Difference between species

1. High proportion of function altering mutations

• Within-species variation2. Low diversity3. Excess of derived alleles4. Differences between populations5. Long unbroken haplotypes

Page 26: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Methods for detecting selection

Test 1: Function altering mutations

Age: many millions of years

Page 27: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

P. C. Sabeti et al., Science 312, 1614 -1620 (2006)

Excess of function-altering mutations in PRM1 exon 2

Test 1: High proportion of function altering mutations

• Over a prolonged period, positive selection can increase the fixation rate of beneficial function-altering mutations.

•Signature detected by comparing rates of mutations

• power limited: needs multiple selected changes before gene stands out from background neutral rate

Synonymous mutation

• Common Statistical test:

• Ka/Ks test

• Relative rate test

• McDonald-Kreitman test

Page 28: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Ka/Ks test (Li et al. 1985)• Main idea:

Contrast two types of substitutions events.

• Goal of the test:– calculate the synonymous rate (Ks) and the non-

synonymous rate (Ka), at each codon site.

• Purified (negative) selection: Ka decreases,

Ka/Ks < 1 is indicative of purifying selection.

• Positive Selection: Ka increases (replacement of amino acid is beneficial to the organism)

Ka/Ks > 1

Page 29: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Within-species Tests

• Test 2:Low diversity, many rare alleles(age < 250,000 years)

• Test 3: Many high frequency derived

alleles(age < 80,000 years)

• Test 4:Long common haplotypes

unbroken by recombination(age < 30,000 years)

Sweep Signatures

Page 30: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Within-species Test 5: population Difference

Extreme population differences (PD) in FY*O allele frequency.

• FY*O allele, which confers resistance to P. vivax malaria, is prevalent and even fixed in many African populations, but virtually absent outside Africa.

Example 1

Age < 50,000 to 70,000 years

Page 31: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

New data sets make genome surveys possible

• Full sequence for human, chimpanzee, mouse

• Dense surveys of human genetic variation

Genome-wide Studies

Page 32: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

• Limited power to detect selection at single genes

• Powerful for functional classes of genes rapidly changing:

• Sperm-related genes• Olfactory (sense of smell) receptors

Between-species results

Page 33: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Finding selective sweeps

• Statistical tests:– Distinguish the pattern of genetic

variation expected under neutrality from that expected under natural selection

• Pick a statistical test to detect sweeps

• Apply the statistic across the genome

Page 34: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

ProblemWe do not fully know the shape of the neutral distribution and how it’s affected by other factors such as demographic history.

Finding selective sweeps

However, the best we can do:

• use statistic based on simulations

• apply it to empirical genome-wide data sets

• Identify the loci in the extreme tail

Most likely

candidate of

selection

Page 35: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Old alleles: • low or high frequency • short-range LD

Positive Selection

Test based on the relationship betweenallele frequency and extent of linkage disequilibrium

Young alleles: • low frequency • long-range LD (long haplotypes)

No Selection

Young alleles: • high frequency • long-range LD

Slide by: David Reich, Broad Institute

Page 36: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

The signal of selection

frequency

Linka

ge D

iseq

uili

bri

um

(H

om

ozy

gosi

ty)

Neutrality

Positive Selection

Slide by: David Reich, Broad Institute

Page 37: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Let us understand these methods better …

Page 38: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Methods for detecting selection

Test 2: Low genetic diversity/many rare alleles

Age < 250,000 years

Within-species Tests

Page 39: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Test 2: Low genetic diversity/many rare alleles

P. C. Sabeti et al., Science 312, 1614 -1620 (2006)

Low diversity and many rare alleles at the Kell blood antigen cluster

• As allele increases in population frequency variants at nearby locations

on the same chromosome (linked variants) rise in frequency.

Such so-called "hitchhiking" "selective sweep”.

• Most common type of variant used: SNPs

• Common Statistical Test:

• Tajima’s D

• Hudson-Kreitman-Aguade (HKA)

• Fu and Li’s D*

Page 40: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Methods for detecting selection

Test 3: High-frequency derived alleles

Age < 80,000 years

Within-species Tests

Page 41: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Test 3: Many high-frequency derived alleles

• Derived alleles: non-ancestral alleles– Arise by new mutations– Typically lower allele frequencies than

ancestral– However, in selective sweep, derived alleles

linked to the beneficial alleles can hitchhike to high frequency.

P. C. Sabeti et al., Science 312, 1614 -1620 (2006)

Figure: Excess of high-frequency derived alleles at the Duffy red cell antigen (FY) gene (34)

Page 42: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Methods for detecting selection

Test 5: Differences between populations

Age < 50,000 to 70,000 years

Within-species Tests

Page 43: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Test 5: Differences between populations• Geographically separate populations are subject to distinct

environmental or cultural pressures change of allele frequency in one populations and not the other.

• Can only arise when populations are at least partially isolated reproductively.– For humans, after the major human migrations out of Africa some

50,000 to 70,000 years ago.

• Weakness of the test: similar to other population genetic signatures, distinguishing between genuine selection and the effect of demographic history (especially population bottleneck) on genetic variation can be hard.

• Common Statistical Tests:

• FST

• Pexcess

Reduction in size of a single, previously larger, population and a loss of prior diversity.

Page 44: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Extreme population Difference

Extreme population differences (PD) in FY*O allele frequency.

• FY*O allele, which confers resistance to P. vivax malaria, is prevalent and even fixed in many African populations, but virtually absent outside Africa.

Example 1

Page 45: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Example 2:

• Region around LCT locus demonstrates large PD between Europeans and non-Europeans Strong selection for lactase persistence allele in Europeans.

LCT

Extreme population Difference

Page 46: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Genome-wide Survey using Tests: Low diversity and population separation

Outliers: low diversity with high population differentiation

Page 47: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

A Little Break?

Page 48: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Interesting fact:Pardis Sabeti is a rock star!

Page 49: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Back to work now…

Page 50: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Methods for detecting selection

Test 4: Long Haplotypes

Age < 30,000 years

Within-species Tests

Page 51: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Recent past

Advent ofa beneficial allele

A short while later…

Present

Model: Maynard Smith and Haigh, 1974, Simulation by SelSim, Coop and Spencer, 2004

Mb0.1 0.1 0.2 0.30.20.3

Page 52: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

gene

Haplotype

5

3

2

1

4

Core Haplotypes

Slide by: David Reich, Broad Institute

Adjacent SNPs that are inherited together are compiled into "haplotypes."

Page 53: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Long-range multi-SNP haplotypes

5

3

2

1

4

C/T A/G A/G C/T C/T C/T

Long-range markersCoremarkers

gene

Decay of LD

Slide by: David Reich, Broad Institute

Page 54: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Long-range multi-SNP haplotypes

100%

Decay of homozygosity

(probability, at any distance, that any two haplotypes that start out the same have all the same SNP genotypes) 18%

gene

C/T A/G A/G C/T C/T C/T

Coremarkers

Long-range markers

G G

C

C

C

C

T

T

T

T

C

T

75% 35%

T TC

C

A G

3

Slide by: David Reich, Broad Institute

Page 55: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Test Statistic: Extended Haplotype Homozygosity (EHH)

(A) Decay of haplotypes in a single region in which a new selected allele (red) is sweeping to fixation, replacing the ancestral allele(blue).

(B) Decay of haplotype homozygosity for ten replicate simulations.

Right side: derived alleles are favored sigma=2Ns= 250

Page 56: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Haplotype Homozygosity: Sabeti et al. 2002

Simulations of decay of haplotype homozygosity

10 simulations: SelSim

Page 57: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

iHS: Measures the extent of haplotypes along alleles at a given SNP

EH

H

Genetic Distance

Ancestral Allele

Derived Allele

0.05

iHHA : iHH with respect to Ancestral core allele.iHHD : iHH with respect to Derived core allele.

Page 58: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

iHS Score

• Useful for variants that have not yet reached fixation.

• Large negative iHS: derived allele has swept up in frequency

• Large positive iHS: an ancestral alleles hitchhike with the selected sites.

• Hence, both cases are considered interesting!

Page 59: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

The Data: Hapmap Project

• 860,000 SNPs genome-wide

• 60 unrelated individuals– European (CEPH): CEU– Nigerians from Ibadan (Yoruba): YRI

• 89 unrelated individuals– Han Chinese from Beijing and Japanese from

Tokyo: ASN

Page 60: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

|iHS|

|iHS|

|iHS|

-iHS

Page 61: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Lines of Evidence: Selection

• Enrichment of signal in genic relative to non-genic regions (p < 10-20)

• Replication of previously published candidates– LCT (Bersagleri et al 2004, Coelho et al 2005)– ADH (Osier et al 2002)– 17q23inv (Stefanson et al 2005)– CYP3A5 (Thompson et al 2004)– Ch. 11 Olfactory gene Cluster (Gilad et al, 2003)

• Correlates with departures in the frequency spectrum (Fay and Wu, 2000)

Page 62: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

SYT1, Yoruba

HaplotypeDecay atSYT1,iHS = -4.7

Plot of high iHS scores on

Chromosome 12

Binds Ca2+; implicated in release

of neurotransmitters [OMIM]

Page 63: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

HaplotypeDecay atSPAG4,iHS = -3.1

Plot of high iHS scores on

Chromosome 20

Interacts with ODF27, gene found

in mammalian sperm tails [OMIM]

SPAG4, East Asians

Page 64: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

How much do regions overlap across populations?

Page 65: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Distribution of

RegionsAcross

Populations

Page 66: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Do signals of selection correlate with known biological processes?

Page 67: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Enriched Ontological Categories• Olfaction [CEU/YRI]

• Gametogenesis/Fertilization [ASN/CEU]

• MHC-I related Immunity [CEU/YRI]

• Metabolism [All]

Page 68: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Other Interesting Stories...

• Skin Pigmentation– MYO5A, OCA2, DTNBP1, TYRP1, SLC24A5* [CEU]

• Sugar Metabolism– MAN2A1 [ASN/YRI]; SI [ASN]; LCT [CEU]

• Processing of Fatty Acids– SLC27A4, PPARD [CEU]; LEPR [ASN]; NCOA1 [YRI]

Page 69: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

Conclusion on iHS Method

• Pervasive signals of positive selection across the human genome

• Both population specific and signals shared between populations

• Strong evidence of selection in Africa (unlike other reports)

• Putative medical relevance because:– Have phenotypic consequences – Differences between populations

Page 70: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006

http://hg-wen.uchicago.edu/selection/haplotter.html

Happlotter

Page 71: Natural Selection in Humans Sharareh Noorbaloochi CS 374 Oct 10, 2006