hapmap: application in the design and interpretation of association studies mark j. daly, phd on...

55
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Upload: juniper-sutton

Post on 11-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

HapMap:

application in the design and interpretation of association studies

Mark J. Daly, PhD on behalf of

The International HapMap Consortium

Page 2: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Goals of this segment

• Briefly summarize HapMap design and current status

• Discuss the application of HapMap to all aspects of association study design, analysis and interpretation

Page 3: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

HapMap Project

High-density SNP genotyping across the genome provides information about– SNP validation, frequency, assay

conditions– correlation structure of alleles in the

genome

A freely-available public resource to increase the power and efficiency

of genetic association studies to medical traits

All data is freely available on the web for applicationin study design and analyses as researchers see fit

Page 4: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

HapMap Samples

• 90 Yoruba individuals (30 parent-parent-offspring trios) from Ibadan, Nigeria (YRI)

• 90 individuals (30 trios) of European descent from Utah (CEU)

• 45 Han Chinese individuals from Beijing (CHB)

• 45 Japanese individuals from Tokyo (JPT)

Page 5: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

HapMap progress

PHASE I – completed, described in Nature paper

* 1,000,000 SNPs successfully typed in all 270 HapMap samples* ENCODE variation reference resource available

PHASE II – data generation complete, data released this past Monday

* >3,500,000 SNPs typed in total !!!

Page 6: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

ENCODE-HAPMAP variation project

• Ten “typical” 500kb regions

• 48 samples sequenced

• All discovered SNPs (and any others in dbSNP) typed in all 270 HapMap samples

• Current data set – 1 SNP every 279 bp

A much more complete variation resource by whichthe genome-wide map can evaluated

Page 7: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Completeness of dbSNP

Vast majority of common SNPs are contained in or highly correlated with a SNP in dbSNP

Page 8: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Recombination hotspots are widespread

and account for LD structure

7q21

Page 9: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Utility of LD in association study

• “If I’m a causal variant, what is relevant to my detection in association studies is how well correlated I am with one of the SNPs or haplotypes examined in the study.”

Page 10: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Coverage of Phase II HapMap(estimated from ENCODE data)

From Table 6 – “A Haplotype Map of the Human Genome”, Nature

Panel %r2 > 0.8 max r2

YRI 81 0.90CEU 94 0.97

CHB+JPT 94 0.97

Page 11: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Coverage of Phase II HapMap(estimated from ENCODE data)

From Table 6 – “A Haplotype Map of the Human Genome”, Nature

Panel %r2 > 0.8 max r2

YRI 81 0.90CEU 94 0.97

CHB+JPT 94 0.97

Percentage of deeply ascertained common variants highly correlated with a HapMap SNP

Page 12: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Coverage of Phase II HapMap(estimated from ENCODE data)

From Table 6 – “A Haplotype Map of the Human Genome”, Nature

Panel %r2 > 0.8 max r2

YRI 81 0.90CEU 94 0.97

CHB+JPT 94 0.97

Average maximum correlation between a deeplyascertained variant and a neighboring HapMap SNP

Page 13: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Coverage of Phase II HapMap(estimated from ENCODE data)

Vast majority of common variation (MAF > .05) captured by Phase II HapMap

Panel %r2 > 0.8 max r2

YRI 81% 0.90CEU 94% 0.97

CHB+JPT 94% 0.97

Page 14: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Applying the HapMap

• Study design - tagging• Study coverage evaluation• Study analysis - improving association

testing• Study interpretation

– Comparison of multiple studies– Connection to genes/genomic features– Integration with expression and other functional

data

• Other uses of HapMap data– Admixture, LOH, selection

Page 15: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Tagging from HapMap

• Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies

Page 16: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium
Page 17: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Pairwise tagging

Tags:

SNP 1SNP 3SNP 6

3 in total

Test for association:

SNP 1SNP 3SNP 6

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

high r2 high r2 high r2

AATT

GC

CG

GC

CG

TCCC

ACCC

GC

CG

TCCC

GGAA

GGAA

After Carlson et al. (2004) AJHG 74:106

Page 18: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Pairwise Tagging Efficiency

Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds

YRI CEU CHB+JPT

Pairwise r2 ≥ 0.5 324,865 178,501 159,029

r2 ≥ 0.8 474,409 293,835 259,779

r2 = 1 604,886 447,579 434,476

Tag SNPs were picked to capture common SNPs in release 16c.1 for every 7,000 SNP bin using Haploview.

Tagging Phase I HapMap offers 2-5x gains in efficiency

Page 19: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Tags:

SNP 1SNP 3SNP 6

3 in total

Test for association:

SNP 1SNP 3SNP 6

Use of haplotypes can improve genotyping

efficiencyTags:

SNP 1SNP 3

2 in total

Test for association:

SNP 1 captures 1+2SNP 3 captures 3+5

“AG” haplotype captures SNP 4+6

AATT

GC

CG

GC

CG

TCCC

ACCC

GC

CG

TCCC

GGAA

GGAA

ACCC

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

tags in multi-marker test should be conditional on

significance of LD in order to avoid overfitting

Page 20: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Efficiency and powerR

elat

ive

pow

er (

%)

Average marker density (per kb)

tag SNPs

randomSNPs

P.I.W. de Bakker et al. (2005) Nat Genet Advance Online Publication 23 Oct 2005

~300,000 tag SNPsneeded to cover commonvariation in whole genome

in CEU

Page 21: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

How to pick tag SNPs?

• What is the genetic hypothesis? Which variants do you want to test for a role in disease?– functional annotation (coding SNPs)– allele frequency (HapMap ascertainment)– previously implicated associations

• Go to http://www.hapmap.org – DCC supported interactive tagging

• Export HapMap data into tools such as Tagger, Haploview (www.broad.mit.edu/mpg)

Page 22: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Will tag SNPs picked from HapMap apply to other population samples?

Population differences add very little inefficiencyPlatform presentation: Paul de Bakker (#223: Sat 9.30)

CEUCEU

Whites fromLos Angeles, CA

Whites fromLos Angeles, CA Botnia, FinlandBotnia, Finland

CEUCEUCEUCEU

Utah residents with European ancestry

(CEPH)

Utah residents with European ancestry

(CEPH)

Page 23: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Applying the HapMap

• Study design - tagging• Study coverage evaluation• Study analysis - improving association

testing• Study interpretation

– Comparison of multiple studies– Connection to genes/genomic features– Integration with expression and other functional

data

• Other uses of HapMap data– Admixture, LOH, selection

Page 24: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Genome-wide association coverage

• If genome-wide products are typed on the HapMap sample panel, the SNPs on HapMap not included in the panel provide an evaluation for the coverage of the product– ENCODE (deep ascertainment) – Phase II (dense, genome-wide)

Page 25: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Association tests with fixed markers

Tests of association:

SNP 1SNP 3

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

AATT

GC

CG

GC

CG

TCCC

ACCC

GC

CG

TCCC

GGAA

GGAA

AC

CC

= SNP on whole-genome product

(~1 - 5% common variation directly assayed)

Page 26: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Association tests with fixed markers

Tests of association:

SNP 1SNP 3

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

high r2 high r2

AATT

GC

CG

GC

CG

TCCC

ACCC

GC

CG

TCCC

GGAA

GGAA

AC

CC

Page 27: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Association tests with fixed markers

Tests of association:

SNP 1SNP 3

SNPs actually tested:

SNP 1SNP 3SNP 2SNP 5

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

high r2 high r2

AATT

GC

CG

GC

CG

TCCC

ACCC

TCCC

AC

CC

GGAA

Page 28: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Genome-wide products can capture most common

variation

0%10%

20%30%

40%50%60%

70%80%

90%100%

0 0.2 0.4 0.6 0.8 1

R2 cutoff

Fra

ctio

n o

f S

NP

s

CEU

YRI

Example: 500K data generated by Affymetrix and recently submitted to HapMap DCC

Page 29: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

More on this topic

• Platform presentations tomorrow morning 8 AM sharp:– Peer– Jorgenson– Lazarus

– As well as several detailed posters!

Page 30: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Applying the HapMap

• Study design - tagging• Study coverage evaluation• Study analysis - improving association

testing• Study interpretation

– Comparison of multiple studies– Connection to genes/genomic features– Integration with expression and other functional

data

• Other uses of HapMap data– Admixture, LOH, selection

Page 31: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Can incorporating tests of haplotypes of SNPs on the

genome-wide product improve this coverage?

Page 32: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Improving association power using data from HapMap

Tests of association:

SNP 1SNP 3

SNPs actually tested:

SNP 1SNP 3SNP 2SNP 5

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

AATT

GC

CG

GC

CG

TCCC

ACCC

TCCC

AC

CC

GGAA

Page 33: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Improving association power using data from HapMap

Tests of association:

SNP 1SNP 3

SNPs actually tested:

SNP 1SNP 3SNP 2SNP 5

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

AATT

GC

CG

GC

CG

TCCC

ACCC

TCCC

AC

CC

GGAA

Page 34: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Improving association power using data from HapMap

Tests of association:

SNP 1SNP 3

“AG haplotype”

SNPs actually tested:

SNP 1SNP 3SNP 2SNP 5SNP 4SNP 6

A/T1

G/A2

G/C3

T/C4

G/C5

A/C6

AATT

GC

CG

GC

CG

TCCC

ACCC

GGAA

Page 35: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Haplotypes increase coverage

0%

20%

40%

60%

80%

100%

0 0.2 0.4 0.6 0.8 1

R2 cutoff

Fra

cti

on

of

SN

Ps

single marker predictors2-marker predictors3-marker predictors

Page 36: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Applying the HapMap

• Study design - tagging• Study coverage evaluation• Study analysis - improving association

testing• Study interpretation

– Connection to genes/genomic features– Comparison of multiple association studies– Integration with expression and other functional

data

• Other uses of HapMap data– Admixture, LOH, selection

Page 37: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Integration with genomic features

• Positive association to a SNP on HapMap enables detailed interpretation:– How many other SNPs are in LD with this

SNP?– What genes are in LD with this SNP?– What coding variants and putative

functional variants are in LD with this SNP?

Potential to improve power by modifying Bayesian priors

of each association test based on this information

Page 38: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Example: Complement Factor H - AMD

• Original SNP hit in Affy 100K experiment – rs380390

• Extent and structure of LD from HapMap aids in the fine mapping phase of project

Klein et al Science 2005

Page 39: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Example: Complement Factor H - AMD

rs380390

Page 40: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Example: Complement Factor H - AMD

rs380390

Page 41: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Meta-analysis of association studies

• When different marker sets are used to study association (candidate gene or genome-wide), results can be readily integrated when all markers are typed on HapMap samples

Page 42: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium
Page 43: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Example: DTNBP1 and schizophrenia

• Multiple studies have described modest association to schizophrenia

• Most studies have examined small numbers of non-overlapping sets of SNPs

• HapMap data can be used to determine whether these association finding

Derek Morris, Mousumi Mutsuddi (WCPG meeting)

Page 44: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Extensive LD across DTNBP1

Phase IIHapMap -186 SNPs180 kb

Page 45: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Phylogeny of DTNBP1 tag SNPs

4 (GA), 5 (CT)

2 (AG)7 (CT)

10 (AT)

3 (GA)

AGGCCT GGATCAAGGCCA AGATTAAAGCCT

AGGCCA

2 4 53 107

Ancestral haplotype

6% 33% 42% 8% 11%

Page 46: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Associated alleles reported

AGGCCT GGATCAAGGCCA AGATTAAAGCCT

AGGCCA

2 4 53 107

Tag SNPsStraub 2002Van den Oord 2003

Page 47: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Associated alleles reported

AGGCCT GGATCAAGGCCA AGATTAAAGCCT

AGGCCA

2 4 53 107

Tag SNPsStraub 2002Van den Oord 2003

Schwab 2003

Page 48: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Associated alleles reported

AGGCCT GGATCAAGGCCA AGATTAAAGCCT

AGGCCA

2 4 53 107

Tag SNPsStraub 2002Van den Oord 2003

Van den Bogaert 2003Funke 2004Schwab 2003

Page 49: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Associated alleles reported

AGGCCT GGATCAAGGCCA AGATTAAAGCCT

AGGCCA

2 4 53 107

Tag SNPsStraub 2002Van den Oord 2003

Van den Bogaert 2003Funke 2004Schwab 2003

Williams 2004Bray 2005

Page 50: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Associated alleles reported

AGGCCT GGATCAAGGCCA AGATTAAAGCCT

AGGCCA

2 4 53 107

Tag SNPsStraub 2002Van den Oord 2003

Van den Bogaert 2003Funke 2004Schwab 2003

Williams 2004Bray 2005

Kirov 2004

Page 51: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Inconsistent findings

• No consistently associated SNP/haplotype pattern across studies

• All studies (European-derived populations) had allele/haplotype frequencies compatible with HapMap-CEU sample

• HapMap can successfully relate associations from diverse marker sets

Page 52: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Other Applications – Structural Variation

• 3 papers coming out in the next month describe use of HapMap data to identify large, common deletion polymorphisms

• LD around these polymorphisms permits their assessment with tag SNPs/haplotypes in genome-wide association studies

Page 53: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Other Applications – Admixture Scanning

• HapMap data provides a rich source of highly differentiated SNPs for design of admixture panels

• Fine mapping of admixture signals can be focused on the full set of highly differentiated alleles in any region of the genome

Page 54: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Other Applications –LOH

• HapMap identifies– Regions of extended LD that may

manifest themselves as unusually long stretches of homozygosity in individual samples

– The catalog of large deletion variants on the HapMap will differentiate between LOH that is potentially de novo and causal, and that which is simply commonly segregating in the population

LOH analysis cognizant of HapMap patterns under development

Page 55: HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium

Early results encouraging

• At this meeting– Arking and colleagues describe

identification of variant altering QT-interval

– Herbert and colleagues describe a novel gene for obesity

– Wijmenga and colleagues describe a novel gene for celiac disease