introduction to the genetics of complex disease · single gene disorders recessive ... •aggregate...

55
Jeremiah M. Scharf, MD, PhD Departments of Neurology, Psychiatry and Center for Human Genetic Research Massachusetts General Hospital Introduction to the Genetics of Complex Disease

Upload: dotruc

Post on 28-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Jeremiah M. Scharf, MD, PhD

Departments of Neurology, Psychiatry and

Center for Human Genetic Research

Massachusetts General Hospital

Introduction to the

Genetics of Complex

Disease

Breakthroughs in Genome Science

Human Genome Project:

Sequence

2001 HapMap Project:

Common Variation

2005 1000 Genomes Project:

Rare Variation

2010 ENCODE Project:

Function

2012

Patterns of Inheritance:

Single Gene Disorders

Recessive

Example: Sickle Cell Anemia

Single gene causes disease

Disease requires two copies of

mutation

Dominant

Example: Huntington Disease

Single gene causes disease

Disease requires one copy of

mutation

Complex Disorders

• Inheritance pattern: multifactorial or “complex”

• Not due to single gene

• Several or many genes may contribute

• Each may have small effect by itself

• Effects may depend on interaction with environment and other genes (epistasis)

Complex Disease Genetics

• Most common medical illnesses are genetically complex

• Aggregate in families but don’t show Mendelian segregation

• Multiple genes contribute to disease in each individual

• Incomplete penetrance and variable expression – penetrance = probability of disease given risk

genotype

• Gene-gene and gene-environment interaction

Chain of Genetic Research

Adapted from Faraone and Tsuang. 1995.

Questions Study Methods

Is the disorder familial? Family study

How much do genes contribute? Twin and adoption studies

What genes are involved? Linkage, association,

sequencing

How do genes cause disease? Functional and biological

studies

Does it “Run in Families”?

• Compare prevalence (risk) in relatives

of affected proband to prevalence in

relatives of unaffected controls

• Recurrence risk ratio:

Risk to first-degree relative of affected

Prevalence in general population 1 =

Familial relative risk (RR) for various

neuropsychiatric disorders

Textbook of Neuropsychiatry and Behavioral Neurosciences, 5th Edition. Eds, Yudofsky SC, Hales RE.

© 2008 American Psychiatric Publishing, Inc. All rights reserved. www.appi.org

Mendelian

(monogenic)

Complex

Inheritance

genetic

+

non-genetic

(“environmental”)

“Deterministic”

“Probabilistic”

SNCA

Parkin

Etc

APP

PS1

PS2

Twin Studies: Is it “Genetic”?

• Compare concordance in MZ vs DZ twins

• MZ > DZ implies genetic contribution

• MZ < 100% implies environmental

contribution

Heritability (h2): Proportion of phenotypic variance

(in a population) attributable to genetic factors.

Heritability Caveats

• A heritability of 60% means

– that at least one gene operates on the trait

– that 60% of the individual differences in that population can be

attributed to differences in the additive effects of certain genes

• A heritability of 60% does not mean

– that the trait of any one individual is 60% determined by his or her

genes, 40% determined by his or her environment

– that environmental interventions can not have striking effects

• Ignores heterogeneity in mode of inheritance

• Depends on degree of genetic and environmental variability in

the population

Courtesy: Shaun Purcell

Estimated Heritability

Disorder/trait Approx. h2

Autism 80%

Schizophrenia 80%

Bipolar Disorder 60-80%

Attention Deficit Disorder ~75%

Tourette Syndrome 60-80%

Inflammatory Bowel Disease 65-75%

Multiple Sclerosis 55%

Alcohol/drug addiction 55%

Major Depression 40%

Anxiety Disorders 30-45%

Breast Cancer 25%

Where are the genes?

Molecular Genetic Methods • Linkage analysis: examines the co-

inheritance of the phenotype with markers of known chromosomal location – Primary application: genome scans (“Where”)

• Association analysis: examines correlation between specific genetic variants and presence of the phenotype – Primary application: candidate gene and

genomewide studies (“Which”)

Linkage vs. Association

Linkage Association

Question Where are the

Genes?

Which Alleles Confer

Risk?

Best Suited For Mendelian Disease Complex Disease

Genomic Scope Whole Genome [Candidate Gene] or

Whole Genome

Subjects Families Case/control or

nuclear families

Markers Microsatellites or

SNPs SNPs

Typical Marker

Spacing < 10 Mb < 10 kb

McCarthy et al., 2008; Sullivan et al., 2012

Genetic Architecture Landscape of mutations that collectively

contribute to disease

Major gene

Large effect

Example:

Huntington’s Disease

Many genes (polygenic)

Small effects

Example:

Height

Boston

McCarthy et al., 2008; Sullivan et al., 2012

Genetic methods target different types of

mutations

COPY NUMBER VARIANTS

LINKAGE

Family-based Case-control

OR

ASSOCIATION

NEXT-GEN

SEQUENCING 1

2

3 3 Early-onset AD

(APP, PS1/2)

Cystic Fibrosis

VCFS/DiGeorge

Williams Syndrome

Idiopathic

Neurodevelopmental Disorders

Late-onset AD

APOE

Common

Disorders

Inflammatory Bowel Disease

Multiple Sclerosis

Type 2 Diabetes

Schizophrenia

ACGGCGCGCATCGCTGATCGATGGCTCGTG

ACAGCAGCTACGACATGACGCAGCGCCAAC

GGGCTAGCTAGCTTTAGTTTCCCCGAAAGCG

CGAGCGACGCTCGATCGCTCGATCGACGGC

GCGCATCGCTGATCGATGGCTCGTGACAGC

AGCTACGACATGACGCAGCGCCAACGGGCT

AGCTAGCTTTAGTTTCCCCGAAAGCGCGAGC

GACGCTCGATCGCTCGATCGACGGCGCGCA

TCGCTGATCGATGGCTCGTGACAGCAGCTA

CGACATGACGCAGCGCCGACGGCGCGCATC

GCTGATCGATGGCTCGTGACAGCAGCTACG

T

SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs):

Most common form of human genetic variation

A G

A G

A G

A A

A A A A

A A A A

A G G G G G G G

G G A G A G G G

Cases Controls

A G A G

G G

A G A G G G

G G

A A

A G

A G A A

A G A G

A A G G

Trios

Association Analysis:

Co-inheritance of Alleles

and Disease Across Families

Are alleles transmitted to

affected offspring more than

50% of time?

Are alleles more common in

cases than controls?

Association Studies are Like

Other Epidemiologic Studies • General Question: Is Exposure Associated

with Disease?

• Is smoking associated with MI?

+

Cases (MI+) Controls (MI-)

+ --

--

-- +

+ + +

-- --

-- -- -- --

+ MI+ MI-

120 50

-- 54 100

OR = (120*100)/(50*54) = 4.44

2 = 41.0, p < .0001

A G

A G

A G

A A

A A A A

A A A A

A G G G G G G G

G G A G A G G G

Alleles as Exposures

Are alleles more common in cases

than controls?

ie Is G allele associated with MI?

Cases (MI+) Controls (MI-)

MI+ MI-

G 120 50

A 54 100

OR = (120*100)/(50*54) = 4.44

2 = 41.0, p < .0001

Family-based Association Analysis:

Transmission/disequilibrium test

?

? 1 2 1 2

1 1

2 2 1 2 1 1

1 2 1 1 1 2

Not

Transmitted

Transmitted

1

1

2

2

a 120

c d

cTDT2

=(b-c)2

(b+c)å

50

95

195

1 2

1

2

Transmitted

Not

Transmitted

TDT

2 =

( 120 - 50 ) 2

( 120 + 50 )

b

=

p < .0001

Association Study Pitfalls

Problem Solutions

False positives:

-Multiple testing (genes x SNPs x phenotypes)

-Low prior probability for any SNP

(even for the “best” candidate gene!)

Correct for multiple testing

Independent Replication!

False negatives:

-Modest effects sizes of susceptibility alleles

-Vast majority of studies are underpowered

-Typical odds ratios for GWAS loci = 1.1-1.3

-Detection requires samples of 10s of

thousands

Increase sample size

Association and Linkage

Disequilibrium

Hirschhorn and Daly, 2005

LD and Haplotypes

• Linkage disequilibrium (LD): correlation in the

population between alleles at two loci. ie non-

random association of alleles at linked loci

• Haplotype: A series of alleles at linked loci

along a single chromosome

• Haplotype (LD) blocks: genomic regions of

LD. The human genome shows a block-like

structure with limited haplotype diversity (Gabriel et al. Science, 2002)

Haplotype: A-A-T

Tag SNPs

McCarthy et al., 2008; Sullivan et al., 2012

GWAS

Family-based Case-control

OR

ASSOCIATION 3

The GWAS Era

• Before 2006: only a handful of genes had been found for any

common medical disorders like diabetes, heart disease,

inflammatory bowel disease, arthritis

• Since 2006: thousands of confirmed genetic findings

for major medical diseases

• What Happened?

Powerful DNA chip technology

Computational advances

Whole genome analysis

Much larger studies

Genomewide Association Studies (GWAS) • Micro-array based genotyping technique

• Assays common DNA variants (“SNPs”) that “tag” blocks of

DNA across the human genome

– mean DNA block size: ~10-20 kb (10-20,000 DNA bases)

– much finer resolution than linkage studies

– each chip assays > 1 million SNP markers in a single experiment

www.nature.com/.../v5/n5/full/nmeth0508-447.html; http://www.illumina.com; http://www.sanger.ac.uk

Genomewide Association Study

(GWAS) • DNA Microarray (DNA-Chip) with 500K - 5M SNPs

covering the genome

• Allele frequencies usually >5%

• Examine for each SNP:

– allele frequency differences between cases and controls

– correlation between allele count and quantitative trait

• Threshold for significance: p < 5 x 10-8

Published Genome-Wide

Associations through 05/2013

Published GWA at p≤5X10-8

Size Matters N = 183,727

Loci: 180

Variance: 10%

N = 249,796

Loci: 32

Variance: 2.5%

Crohn’s: ~ 10 genes / 1,000 cases

Schizophrenia: ~ 4 / 1,000

Adult Height: ~ 3/ 1,000

(Bipolar Disorder: ~ 1 gene/ 1,000 cases)

# G

WA

S L

oci

# of cases

Key Plots Summarizing GWAS

Q-Q plot

Manhattan plot

Regional plot

So, you found an association.

Is it due to…? • True association with causal variant?

• Spurious association due to confounding? (population stratification)

• Linkage disequilibrium with nearby causal variant?

• Chance

– indexed by p value--but beware multiple testing!

Population Genetics

• Study of allele frequency distribution

and change

Hardy-Weinberg Equilibrium

• large population

• no mutation

• no selection

• random mating

• no migration

[A] = p

[a] = q

p + q =1

[AA] = p2

[Aa] = 2pq

[aa] = q2

frequencies remain stable

With genome-wide SNP data, population structure can be detectable to very fine scales...

Novembre et al (2008)

Population Stratification

• Differences in allele

frequencies between

cases and controls due

to systematic differences

in ancestry rather than

association of genes

with disease.

Population Allele Differences Can

Confound Association Studies • Does A/G SNP in CNR1 gene cause MI?

• Cases recruited from MGH patients: – 55% European-American

– 20% African-American

• Controls recruited from volunteers – 85% European American

– 5% African American

G A

European

American .7 .3

African

American .4 .6

A G

A G

A G

A A

A A A A

A A A A

A G G G G G G G

G G A G A G G G

Cases (MI+) Controls (MI-) p < .0001

Ioannidis et al. 2009, Nature Rev Genet

Association ≠ Causality

McCarthy et al., 2008; Sullivan et al., 2012

COPY NUMBER VARIANTS

2

Copy Number Variation • Structural variations of > 1kb

• Low copy repeats are common mechanism:

– Highly homologous sequence elements arising from segmental duplication

• E.g. cause of psychiatric illness

– VCFS/DiGeorge syndrome - microdeletion on 22q: 20-30% incidence of

psychotic illness

– Autism - de novo CNVs in >10% of sporadic cases?

Large, rare CNVs are found across

neurodevelopmental disorders

From Morrow, JAACAP 2010

Next-Generation Sequencing

2001: $3 billion 2017-2018: <$1,000

Bras et al. Nature Rev Neurosci, 2012

Related Methods: The “-omics”

• Functional Genomics - a field of molecular biology that attempts to

make use of the vast wealth of data produced by genomic projects to

describe gene and protein functions and interactions. Focuses on

dynamic aspects such as gene transcription, translation, and protein-

protein interactions, as opposed to the static aspects of the genomic

information such as DNA sequence or structures.

– Transcriptomics (expression profiling)- examines the expression level of

mRNAs in a given cell population, often using high-throughput techniques

based on microarray technology.

– Proteomics- examines the full complement of proteins and their structure,

quantity, and function

– Metabolomics- examines the whole set of small-molecule metabolites (such

as metabolic intermediates, hormones and other signalling molecules, and

secondary metabolites) to be found within a biological sample or organism

– Interactomics- examines the whole set of molecular interactions in cells

ENCODE Project

RoadMap Epigenomics Consortium, Nature 2015

Nature 518, 317–330 (19 February 2015) doi:10.1038/nature14248

Genomics In Silico

NIH BISTIC definition

• Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

• Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

Systems Biology: The Big Picture

Oltvai, Science, 2002

Translating Genetic Findings

to Novel Therapies

How Do We Get

There From Here?

Develop Functional

Assays

Preclinical and

Safety Studies

Confirmed

Genetic Variants

Small Molecule

Screening Proof-of-Concept

Trials

Larger Clinical

Trials

Biological

Characterization

Skin cells

Induced

Stem Cells

Neurons

Glia

Animal Models

Summary • Most common diseases are complex

– Aggregate in families with non-Mendelian patterns of inheritance

– Multiple genes of varying effect

– +/- Gene-gene interaction (epistasis), gene-environment interaction

• Association analysis is most common method for identifying susceptibility alleles – Interpret with care:

• Beware false positives

• Replication is essential

• Exome and whole genome sequencing now feasible and successful in identifying rare variants related to Mendelian and complex disorders – Ultimately, whole genome sequencing may become the

preferred approach

Brief Break?