association mapping: finding genetic variants for common traits & diseases manuel ferreira...

33
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology 1 WEHI Postgraduate seminar, 31 May 2010

Upload: christal-potter

Post on 13-Jan-2016

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Association mapping:

finding genetic variants for common

traits & diseases

Manuel Ferreira

Queensland Institute of Medical Research

Brisbane

Genetic Epidemiology

1WEHI Postgraduate seminar, 31 May 2010

Page 2: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

2

Predict disease risk / drug response Personalized Medicine

Lancet 2010; 375: 1525–35

Understand disease aetiology

Why?

Page 3: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

3

Rare, monogenic traits

Ng et al. Nature Genetics 2010; 42: 30-35.

Page 4: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

G E

O

DISEASERISK

4

Common, complex traits

Page 5: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Phenotypic modelling

Linkage analysis

Association analysis

GENETICS OF COMMON DISEASES

1990

2000

2005

2008

2009

2010

2015

Page 6: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Recent advances assays/analysis genetic

variationHapMap, 1000 Genomes

High-throughput genotyping & sequencing

Analytic Methods

Genome-wide association, imputation, stratification, CNVs, risk prediction

genes env

other

DISEASERISK

genes

6

Page 7: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

HapMap project

“The HapMap was designed to determine the frequencies and patterns of association among roughly 3 million common Single Nucleotide Polymorphisms (SNPs) in four populations, for use in

genetic association studies.” [4]

1. GOALS

[1] The International HapMap Consortium. Nature 2003; 426: 789. [2] International HapMap Consortium. Nature 2005; 437: 1299.[3] International HapMap Consortium. Nature 2007; 449: 851.[4] Manolio et al. J Clin Invest 2008; 118: 1590.

Individuals

SNPs

7

Page 8: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

HapMap project

2. STRATEGY

30 trios Yoruba in Ibadan, Nigeria (YRI)30 trios European descent in Utah (CEU)45 unrelated Han Chinese from Beijing (CHB)45 unrelated Japanese from Tokyo (JPT)

Genome-wide SNP discovery1,7 million dbSNP 9,2 million

2002 200514,7 million (6,5 million validated)

2009

Genotyping

Phase 1: MAF>0.05, validated, non-synonymous SNPs prioritised (1,27 million total)

Phases 2 and 3 expanded SNP (4 million) and population (11) coverage

http://www.hapmap.org/

SNP selection

7 genotyping platforms used/developed by 12 centres

8

Page 9: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

HapMap project

3. OUTCOMES

“Systematic” catalogue of common human variation

Linkage disequilibrium (LD) or correlation between SNPs(tagging, fine-mapping, imputation)

Designing and refining high-throughput genotyping platforms

9

Population genetics (selection, sub-structure, recombination & mutation)

Page 10: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

10

Gene A

Haplotypes

HapMap SNPs

D’ and r2

Correlation (LD) between SNPs

Haploview, TaggerSNP tags

Genetic CoverageProportion of known SNPs taggedHaploview

Fine-mappingInteresting SNPs to follow-upCross-study comparisons

eg. SNP 1 ‘tags’ 4/10 variants

Page 11: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

11

1000 Genomes project

GOAL

http://www.1000genomes.org/

“The 1000 Genomes Project aims to achieve a nearly complete catalog of common human genetic variants (defined as frequency 1% or higher) by generating high-quality sequence

data for >85% of the genome for three sets of 400-500 individuals (...)”

2,500 samples at 4x by 2011

Page 12: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

High-throughput genotyping & sequencing

12

Whole-genome genotyping (from $300 USD/sample)

Whole-genome sequencing (from $10,000 USD/sample)

Illumina:

HiSeq 200030x coverage

100 bp read length

Complete Genomics

40x coverage35 bp read length

Affymetrix:

6.0 chip>900,000 SNPs

CNV probes82% coverage CEU HapMap

Accuracy 99.90%

Illumina:

Human1M BeadChip>1 million SNPs

CNV probes95% coverage CEU HapMap

Accuracy 99.94%

Page 13: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Recent advances assays/analysis genetic

variationHapMap, 1000 Genomes

High-throughput genotyping & sequencing

Analytic Methods

Genome-wide Association, stratification, imputation, CNV, risk prediction

Examples: recent GWAS.

13

Page 14: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Analytic methods

1. GENOME-WIDE ASSOCIATION

14

Indi

vidu

als

SNPs

cases

controls

cases controls

No association

Association

Page 15: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Analytic methods

15

Association testsStudy designs

Unrelated individuals

Families

Software

Between individual effects

Between + Within family effects

Many (eg. PLINK)

Merlin, etc

Unrelated individuals

Families

More power / $ spent, easier to collect, analyse

Assess inheritance (CNVs), robust population stratification

Pros

Page 16: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Analytic methods

2. POPULATION STRATIFICATION

Ind1 Ind2 % shared

A1 A2 100

A1 A3 50

A1 A4 25

A1 A5 10

A1 A6 8

A1 B1 5

Genetic matching

AB

BA

16

A

B

B

A

cases controls

Page 17: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Analytic methods

3. IMPUTATION OF UNMEASURED GENOTYPES

Reference panel (eg. HapMap)

Genotyped Dataset

Individuals

SNPs MAF N SNPs Imputation Info score

Proportion of SNPs

Average Imputation

Rate

Average Concordance

0.01-0.05 27,078 Not imputed 0.000 - -

0-0.5 0.325 0.841 0.966

0.5-0.8 0.149 0.917 0.979

≥0.8 0.526 0.992 0.992

0.05-0.15 71,984 Not imputed 0.002 - -

0-0.5 0.164 0.525 0.934

0.5-0.8 0.175 0.750 0.961

≥0.8 0.659 0.967 0.989

0.15-0.25 65,918 Not imputed 0.004 - -

0-0.5 0.082 0.248 0.874

0.5-0.8 0.164 0.554 0.939

≥0.8 0.750 0.939 0.986

0.25-0.50 146,253 Not imputed 0.004 - -

0-0.5 0.053 0.094 0.777

0.5-0.8 0.145 0.389 0.907

≥0.8 0.798 0.917 0.981

MACH, IMPUTE, BEAGLE17

Shaun Purcell, Doug Ruderfer (PLINK)

Genotyped + Imputed Dataset

Page 18: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

18

Affy

Illumina

Perlegen

HapMap

Combine data from studies genotyped using different platforms

Page 19: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Example 1: Bipolar Disorder GWAS

WTCCC STEP-UCL ED-DUB-STEP2 Overall

Sample Size

N (% males) 4,764 (45) 3,467 (47) 2,365 (40) 10,596 (44)

Cases (% males) 1,829 (38) 1,460 (43) 1,098 (44) 4,387 (41)

Controls (% males) 2,935 (49) 2,007 (50) 1,267 (36) 6,209 (47)

Genotype missing rate 0.0027 0.0057 0.0031 0.0038

MAF GRR Power (α = 5 × 10-8)

0.05 1.40 0.05 0.02 <0.01 0.61

0.20 1.20 0.03 <0.01 <0.01 0.48

0.40 1.15 0.02 <0.01 <0.01 0.31

Ferreira et al (2008) Nature Genetics 40: 105619

325,690 SNPs

>1,7 million SNPs

Page 20: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

ANK3: Ankyrin G

Cases: 7.0% Controls: 5.3%Odds ratio = 1.45

Not related to sex, psychosis or age-of-onset

Smith et al (2009) Mol Psychiatry 14: 755-63.

Scott et al (2009) Proc Natl Acad Sci USA 106: 7501-6.

[Lee et al (2010) Mol Psychiatry Apr 13 – Han Chinese population]

20

Replicated recently

Page 21: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Example 2: analysis of lymphocyte subsets

Ferreira et al. (2010) Am J Hum Genet 86: 88-92 21

2,538 individuals | CD4+ T cell levels, CD8+ T cell levels, CD4:CD8 ratio

MHC class I• rs2524054, C• Increased CD8+ T levels• Improved host control of HIV (OR=0.32, P=10-9)

MHC class II• rs9270986, A• Increased CD4+ T levels• Protective effect for type-1 diabetes (OR = 0.04, P=10-125)• Protective effect Rheum. Arthritis (OR=0.60, P=10-15)

Page 22: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Structural Variants

Genomic alterations involving segment of DNA >1kb

Quantitative

(Copy Number Variants)

Positional (Translocations)

Orientational (Inversions)

Deletions

Duplications

Insertions

Analytic methods

4. Structural Variants

Page 23: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Detection of CNVs

Non-polymorphic probesMcCarroll et al 2008 Nat Genet 40: 1166

Page 24: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Detection of CNVs

Use polymorphic probes from genotyping arrays to Identify and genotype new, potentially rarer CNVs

Example: rs1006737 A/G ... AGCCCGAAATGTTTTCAGA...

... AGCCCGAAGTGTTTTCAGA...

probe 1

probe 2AAAGGG

Intensity of probe 2

Inte

nsity

of

prob

e 1

Page 25: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Detection of CNVs

1 A/G 1 1 2

2 A/- 1 0 1

3 AA/- 2 0 2

4 -/G 0 1 1

5 -/- 0 0 0

6 AAA/G 3 1 4

...CG ATG...

ATG......CGATG......CG

ATG......CG

ATG......CGATG......CG

ATG......CGATG......CG

ATG......CGATG......CG

ATG......CGATG......CG

A/G

A

AG

A

AA

G

A A AG

ATG......CG

Mat/PatIndGenotype Copy number for:

A G TotalPattern

Page 26: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Detection of CNVs ...CG ATG...

A/G

A

Normalized intensity of allele A

Nor

mal

ized

inte

nsity

of

alle

le G

Polymorphic probe in CNV region

A/A

A/G

G/G

Individuals with

deletion(s)

Individuals with

duplication(s)ie. total CN > 2

ie. total CN < 2

Page 27: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Detection of CNVs

Combine information across probes to identify new CNVs

For example... Cases Controls

100kb deletion chr. 2 10/5,000 1/5,000

Korn et al 2008 Nat Genet 40: 1253

BirdseyeAffy 5.0, 6.0

Wang et al 2007 Genome Res 17: 1665

PennCNVAffymetrix and Illumina

Page 28: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Example 3: Autism whole-genome CNV

analysisSample 16p11 Cases Controls P

Discovery Del (600kb) 5/1,441 3/4,2341.1 x 10-4

[Affy 500K] Dup 7/1,441 2/4,234

Replication 1 (CHB) Del 5/512 0/4340.007

[array-CGH] Dup 4/512 0/434

Replication 2 (deCODE) Del 3/299 2/18,8344.2 x 10-4

[Illumina] Dup 0/299 5/18,834

Deletion frequency Iceland

Autism 1%Psychiatric disorder 0.1%General population 0.01%

Weiss et al. N Engl J Med 2008; 358: 667

COPPERBirdseye

CNAT

del dup

inherited 2 6de novo 10 1unknown 1 4

Page 29: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Example 4: SCZ whole-genome CNV

analysis

Shaun Purcell

CasesCases

ControlsControlsChromosome Chromosome →→

Genome-wide burden

Specific loci

Page 30: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

3,391 patients with SCZ, 3,181 controlsFilter for <1% MAF, >100kb

6,753 CNVs

Cases have greater rate of CNVs than controls1.15-fold increase

P = 3×10-5

Cases have greater rate of CNVs than controls1.15-fold increase

P = 3×10-5

Rate of genic CNVs in cases versus controls1.18-fold increase

P = 5×10-6

Rate of genic CNVs in cases versus controls1.18-fold increase

P = 5×10-6

Rate of non-genic CNVs in cases versus controls1.09-fold increase

P = 0.16

Rate of non-genic CNVs in cases versus controls1.09-fold increase

P = 0.16

Results invariant to obvious statistical controlsArray type, genotyping plate, sample collection site, mean probe intensity

Results invariant to obvious statistical controlsArray type, genotyping plate, sample collection site, mean probe intensity

Genome-wide burden of rare CNVs in SCZ

Shaun Purcell

Page 31: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Similar successes for Similar successes for other common diseasesother common diseases

31

Page 32: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Jan 2006 to

Jan 2008

before Jan 2006

Crohn’s Disease (31 loci, ~10% variance)

10

20

30

0

5

http://www.genome.gov/gwastudies

Altshuler, Daly & Lander. Science 2008; 322: 881Manolio, Brooks & Collins. J Clin Invest 2008 118: 1590

N c

onfir

med

loci

32

Page 33: Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology

Summary

Tremendous recent technological advances

Large-scale genetic association studies feasible

>150 disease loci unequivocally identified since 2006

Provide a solid base to build our knowledge about disease mechanisms

Hundreds of loci yet to be identified for most diseases

33