give me your dna and i tell you where you come from - and maybe more!

38
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergm ann

Upload: velika

Post on 05-Jan-2016

32 views

Category:

Documents


4 download

DESCRIPTION

Give me your DNA and I tell you where you come from - and maybe more!. Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Give me your DNA  and I tell you where you come from - and maybe more!

Give me your DNA and I tell you where you come

from - and maybe more!

Lausanne, Genopode 21 April 2010

Sven Bergmann

University of Lausanne &

Swiss Institute of Bioinformatics

http://serverdgm.unil.ch/bergmann

Page 2: Give me your DNA  and I tell you where you come from - and maybe more!

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Future directions

Page 3: Give me your DNA  and I tell you where you come from - and maybe more!

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Future directions

Page 4: Give me your DNA  and I tell you where you come from - and maybe more!

6’18

9 in

divi

dual

s

Phenotypes

159 measurement

144 questions

Genotypes

500.000 SNPs

CoLaus = Cohort Lausanne

Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

Page 5: Give me your DNA  and I tell you where you come from - and maybe more!

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG…

ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG…

Genetic variation in SNPs (Single Nucleotide Polymorphisms)

Page 6: Give me your DNA  and I tell you where you come from - and maybe more!

Analysis of Genotypes only

Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

Page 7: Give me your DNA  and I tell you where you come from - and maybe more!

Ethnic groups cluster according to geographic distances

PC1 PC1

PC

2P

C2

Page 8: Give me your DNA  and I tell you where you come from - and maybe more!

PCA of POPRES cohort

Page 9: Give me your DNA  and I tell you where you come from - and maybe more!

Predicting location according to SNP-profile ...

Page 10: Give me your DNA  and I tell you where you come from - and maybe more!

… is pretty accurate!

Page 11: Give me your DNA  and I tell you where you come from - and maybe more!

The Swiss segregate according to language

Page 12: Give me your DNA  and I tell you where you come from - and maybe more!

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Future directions

Page 13: Give me your DNA  and I tell you where you come from - and maybe more!

Phenotypic variation:

Page 14: Give me your DNA  and I tell you where you come from - and maybe more!

0

0.2

0.4

0.6

0.8

1

1.2

-6 -4 -2 0 2 4 6

What is association?chromosomeSNPs trait variant

Genetic variation yields phenotypic variation

Population with ‘ ’ allele Population with ‘ ’ allele

Distributions of “trait”

Page 15: Give me your DNA  and I tell you where you come from - and maybe more!

Association using regression

genotype Coded genotype

phen

otyp

e

Page 16: Give me your DNA  and I tell you where you come from - and maybe more!

Regression formalism

(monotonic)transformation

phenotype(response variable)of individual i

effect size(regression coefficient)

coded genotype(feature) of individual i

p(β=0)error(residual)

Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

Page 17: Give me your DNA  and I tell you where you come from - and maybe more!

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Future directions

Page 18: Give me your DNA  and I tell you where you come from - and maybe more!

Whole Genome Association

Page 19: Give me your DNA  and I tell you where you come from - and maybe more!

Whole Genome AssociationCurrent microarrays probe ~1M SNPs!

Standard approach: Evaluate significance for association of each SNP independently:

sig

nif

ican

ce

Page 20: Give me your DNA  and I tell you where you come from - and maybe more!

Whole Genome Associationsi

gn

ific

ance

Manhattan plot

ob

serv

edsi

gn

ific

ance

Expected significance

Quantile-quantile plot

Chromosome & position

GWA screens include large number of statistical tests!• Huge burden of correcting for multiple testing!• Can detect only highly significant associations (p < α / #(tests) ~ 10-7)

Page 21: Give me your DNA  and I tell you where you come from - and maybe more!
Page 22: Give me your DNA  and I tell you where you come from - and maybe more!
Page 23: Give me your DNA  and I tell you where you come from - and maybe more!
Page 24: Give me your DNA  and I tell you where you come from - and maybe more!
Page 25: Give me your DNA  and I tell you where you come from - and maybe more!

Current insights from GWAS:

• Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations

• Many of these associations have been replicated in independent studies

Page 26: Give me your DNA  and I tell you where you come from - and maybe more!

Current insights from GWAS:

• Each locus explains but a tiny (<1%) fraction of the phenotypic variance

• All significant loci together explain only a small (<10%) of the variance

David Goldstein:

“~93,000 SNPs would be required to explain 80% of the population variation in height.”

Common Genetic Variation and Human Traits, NEJM 360;17

Page 27: Give me your DNA  and I tell you where you come from - and maybe more!

1. Other variants like Copy Number Variations or epigenetics may play an important role

2. Interactions between genetic variants (GxG) or with the environment (GxE)

3. Many causal variants may be rare and/or poorly tagged by the measured SNPs

4. Many causal variants may have very small effect sizes

5. Overestimation of heritabilities from twin-studies?

So what do we miss?

Page 28: Give me your DNA  and I tell you where you come from - and maybe more!

Overview• Population stratification

• Associations: Basics

• Whole genome associations

• Genotype imputation

• Future directions

Page 29: Give me your DNA  and I tell you where you come from - and maybe more!

Intensity of Allele G

Inte

nsi

ty o

f A

llele

A

Genotypes are called with varying uncertainty

Page 30: Give me your DNA  and I tell you where you come from - and maybe more!

Some Genotypes are missing at all …

Page 31: Give me your DNA  and I tell you where you come from - and maybe more!

… but are imputed with different uncertainties

Page 32: Give me your DNA  and I tell you where you come from - and maybe more!

… using Linkage Disequilibrium!

Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.

Marker 1 2 3 n

LD

D

Page 33: Give me your DNA  and I tell you where you come from - and maybe more!

Two easy ways dealing with uncertain genotypes

1. Genotype Calling: Choose the most likely genotype and continue as if it is true(p11=10%, p12=20% p22=70% => G=2)

2. Mean genotype: Use the weighted average genotype(p11=10%, p12=20% p22=70% => G=1.6)

Page 34: Give me your DNA  and I tell you where you come from - and maybe more!

Overview• Associations: Basics

• Whole genome associations

• Population stratification

• Genotype imputation

• Uncertain genotypes

• Future directions

Page 35: Give me your DNA  and I tell you where you come from - and maybe more!

Organisms

Data types

Conditions

Developmental

Physiological

Environmental

Experimental

Clinical

– Protein expression– Tissue specific expression– Interaction data– Genotypic data– Epigenetic data … ?

Biological Insight

The challenge of many datasets: How to integrate all the information?

Page 36: Give me your DNA  and I tell you where you come from - and maybe more!

Network Approaches for Integrative Association

Analysis

Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

Page 37: Give me your DNA  and I tell you where you come from - and maybe more!

Modular Approach for Integrative Analysis of Genotypes and Phenotypes

Individuals

Genotypes

Phenotypes

Me

as

ure

me

nts

SN

Ps/H

ap

lotyp

es

Modular links

Page 38: Give me your DNA  and I tell you where you come from - and maybe more!

• Analysis of genome-wide SNP data reveal

population structure mirrors geography

• Genome-wide association studies reveal

candate loci for a multitude of traits, but have

little predictive power so far

• Future improvement will require– better genotyping (CGH, UHS, …) – New analysis approaches (interactions,

networks, data integration)

Take-home Messages: