john p. hussman institute for human genomics university of

36
Stephan Züchner, MD John P. Hussman Institute for Human Genomics University of Miami Miller School of Medicine

Upload: others

Post on 18-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: John P. Hussman Institute for Human Genomics University of

Stephan Züchner, MD

John P. Hussman Institute for Human Genomics

University of Miami Miller School of Medicine

Page 2: John P. Hussman Institute for Human Genomics University of

• Part of a patent licensing agreements with Athena Diagnostics.

• Receiving honorarium from Illumina.

Page 3: John P. Hussman Institute for Human Genomics University of

Completion of Human Genome Project: 2001 - 2011

• Completion using Sanger sequencing

• Initiation of new seq technologies:

Shot-gun approach

Sequencing by synthesis

• Today, seq industry very competitive,

extremely innovative

Page 4: John P. Hussman Institute for Human Genomics University of

• HiSeq2000 produces >300 billion bases per run (9days, ~$20K)

that is a 100,000-fold improvement in 10 years

• >600Gb by mid 2011

• The rate of technical improvement in the sequencing arena by far

outpaces Moore's Law (2 fold in 1.5 years).

Recent numbers …

Page 5: John P. Hussman Institute for Human Genomics University of

• Most challenges relate to the analysis of data.

• Study designs.

• Interdisciplinary teams are key (molecular, bioinformatics, clinical,

statistical expertise).

• Ever evolving tool set – much time is occupied by staying up-to-date.

Challenges

Page 6: John P. Hussman Institute for Human Genomics University of

03/2010

and Richard A. Gibbs, Ph.D.

Commentary in Nat Rev Neurology, S. Züchner 2010

Page 7: John P. Hussman Institute for Human Genomics University of

Individual genome vs exome sequencing

o Not yet suitable for routine whole human genome sequencing:

o Cost for sequencing (still ~$10K per genome)

o Cost for data processing and storage

o Cost and time for bioinformatic analysis and follow-up studies

o For many disease-oriented applications in human genetics, partial sequencing

of the human genome is sufficient (linkage peaks, association areas, etc).

Hence, EXOME sequencing is becoming a major (temporary) application.

Page 8: John P. Hussman Institute for Human Genomics University of

What (exactly) is the “Exome”?

Coding exons Mb coding exonic sequence

CCDS 196,266 ~32

Exome enrichment kits (Roche, Agilent, Illumina)

~200,000 ~38 - 62

o The number of all coding exons in the human genome.

o The true size is unknown and will continue to change over the next years.

o Exome kits capture ~96 - 98% of CCDS (Consensus Coding Sequence).

Page 9: John P. Hussman Institute for Human Genomics University of

6,000 monogenic disorders described

<2,000 disease genes identified

For many disorders, Mendelian genes have provided unique guidance to the underlying pathways.

Immediate modeling in vitro and in vivo possible.

Gene discovery in Mendelian diseases

Page 10: John P. Hussman Institute for Human Genomics University of

GWAS have successfully determined the contribution of common variation to disease.

A large gap of “missing heritability” exists for many phenotypes.

Rare variants may play a significant role in common so-called complex disease.

Rare variant discovery in common disease

Page 11: John P. Hussman Institute for Human Genomics University of

o % of reads aligning to the human genome reference sequence.

o % of reads on target.

o % of targets covered by a minimum of reads.

o Allelic bias.

General issues with exon capture and NGS

(from Hedges et al., 2009; Nimblegen arrays/ 454 seq)

Page 12: John P. Hussman Institute for Human Genomics University of

10

100

1000

Rea

d d

ep

th i

n r

ea

ds p

er

ba

se

po

sit

ion

• Uniformity of depth of sequence coverage requires 100-200 - times the

sequence amount of the target size

bp-wise sequence depth of CMT genes

Uniformity/ evenness of coverage depth

Page 13: John P. Hussman Institute for Human Genomics University of

Nimblegen V. 2

Newer designs of capture kits improve evenness and coverage Plots of Coverage Depth Across exons of 40 CMT Genes

Nimblegen V. 1

Page 14: John P. Hussman Institute for Human Genomics University of

* (p<0.05)

Coefficient of variation - EZ exome V1 vs. V2

Based on 40 neuropathy

related genes.

V1 Roche V2 in house V2 Roche

Page 15: John P. Hussman Institute for Human Genomics University of

Proportion of uncovered bases - EZ exome V1 vs. V2

Based on 40 neuropathy

related genes.

V1 Roche V2 in house V2 Roche

Avg

. pro

port

ion o

f uncovere

d b

ases p

er

gene

Page 16: John P. Hussman Institute for Human Genomics University of

(Miller syndrome)

(Bartter syndrome) November 2009

Page 17: John P. Hussman Institute for Human Genomics University of

American Journal of Human Genetics, February, 2011

Page 18: John P. Hussman Institute for Human Genomics University of

• Retinitis pigmentosa (RP) causes degeneration of photoreceptors:

Impaired night vision loss of peripheral vision

loss of central vision in later life.

• Prevalence is approximately 1 in 3,000 - 4,500 individuals.

• 50 genes are known to cause RP, but …

~ 50% of RP patients have mutations in unknown genes.

Page 19: John P. Hussman Institute for Human Genomics University of

Images from the Foundation Fighting Blindness.

Page 20: John P. Hussman Institute for Human Genomics University of

• We studied an RP family of Ashkenazi Jewish origin.

• All known RP genes had been excluded.

• Single pedigree with only three affected siblings

- traditionally very difficult to find the underlying novel gene.

Page 21: John P. Hussman Institute for Human Genomics University of

Affected

Sibling 1

Affected

Sibling 2

Affected

Sibling 3

Missense,

non-sense, splice

site variations

8,712 8,716 8,752

Filtered for

homozygosity

and novelty

11 18 27

Variants detected with exome sequencing

• Across the four individuals we identified 19,307 coding single nucleotide variants.

• No novel indels co-segregated with disease.

Affected

Sibling

1+2

Affected

Sibling

1+2+3

+ NOT in

Unaffected

Sibling 4

5

4

1

(DHDDS)

All detected changes Sharing within family

Page 22: John P. Hussman Institute for Human Genomics University of

Chromosomes screened

Variant observed

Estimated MAF

Estimated homozygous frequency

Jewish 1,434 8 0.0056 0.00003136

Non Jewish 13,954 0 < 0.000072 < 5.2 E-09

Unknown Ethnicity 11,786 1 0.000085 7.2E-09

Sum 27,174 9

Detailed results of genotyping of population controls for

the identified variant in DHDDS.

Page 23: John P. Hussman Institute for Human Genomics University of

DHDDS (dehydrodolichol diphosphate synthase) links

important pathways in RP

1. Pathway analyses

Page 24: John P. Hussman Institute for Human Genomics University of

The mutated amino acid is highly conserved across species

2. Conservation analysis

Page 25: John P. Hussman Institute for Human Genomics University of

3D in silico modeling of protein function

• The K42 (+) residue stabilizes the farnesyl-pyrophosphate (FPP) binding

pocket via charge-charge repulsive forces towards R38 (+).

• The mutant E42 (-) will compete for R38 (+) binding.

3. In-silico function

Page 26: John P. Hussman Institute for Human Genomics University of

Morpholino knock-down of DHDDS function in zebrafish

DHDDS deficient Morpholino control

Compared to control zebrafish, morpholino knock-down of DHDDS

significantly reduces escape reactions to light changes.

4. Animal modeling

Page 27: John P. Hussman Institute for Human Genomics University of

Histopathology of zebrafish eye – rods of photoreceptors are degenerated

DHDDS deficient Wild type

Page 28: John P. Hussman Institute for Human Genomics University of

5. Additional genetic support

DHDDS mutation found in

15 out of 123 index patients

(12%)

Page 29: John P. Hussman Institute for Human Genomics University of

Summary

• We have identified a novel RP gene, DHDDS, highlighting a key

biological pathway.

• Exome sequencing of rare genetically heterogeneous

phenotypes will require complementary functional approaches.

• We have demonstrated that in silico protein studies and

zebrafish modeling are sufficient, fast, and cost-effective

strategies.

Science, November 2011

Page 30: John P. Hussman Institute for Human Genomics University of

Team work ...

HIHG

Stephan Züchner, Gary Beecham, Adam Naj, Amjad Farooq, Martin Kohli,

Patrice L. Whitehead, William Hulme, Ioanna Konidari, Juan Young, David

Seo, Susan Blanton, Jeffery M. Vance, and Margaret A Peričak-Vance

Department of Biology

Julia Dallman

BPEI

Byron Lam, Rong Wen, Eduardo Alfonso

Vanderbilt University

Jonathan Haines

Department of Biochemistry

Amjad Farooq

Mt. Sinai Hospital, NYC

Joseph Buxbaum

Page 31: John P. Hussman Institute for Human Genomics University of

What can go wrong in targeted or exome sequencing?

Capture/ enrichment:

• Technical issues, sample mix-up

• Relevant variant(s) not covered by capture/ enrichment kit (capture probe design, large

sequence never 100% suitable for hybridization)

• Uniformity/ evenness low

Sequencing:

• Technical issues

• Insufficient sequence amount (low coverage)

• Read length choice, single vs paired-end reads

Analysis:

• Ambiguous and/ or multiple alignment of reads (pseudo genes, repetitive sequence, GC)

• Variant calling fails for specific reasons (low coverage or quality)

Annotation:

• Automated mass annotation is essential, but can be erroneous or incomplete (splice

variants, functional synonymous changes, bindings sites for regulatory factors, unknown

exons)

Interpretation:

• Wrong assumptions regarding the outcome (statistical model, class of molecular variant)

• Inadequate statistical power

• Human error

Page 32: John P. Hussman Institute for Human Genomics University of

What do we usually miss with exome sequencing?

• Copy number variation

• Large indels (>20bp)

• Long repeats (STR)

• Homologous regions

• Unknown exons

• UTR

• Regulatory and intronic changes

Page 33: John P. Hussman Institute for Human Genomics University of

Hussman Institute for Human Genomics

• 7 next generation sequencing instruments, max capacity of 1.5 Trillion base

pairs every 9 days (this will roughly double with instrument upgrade early May).

• Single run produces ~4 Terabytes of raw data: 1.2 Petabyte disc storage.

• 5,000 node computing cluster.

• Developed fully-automated exome capture on Caliper robot with capacity of 288

exome samples per week.

Page 34: John P. Hussman Institute for Human Genomics University of

At HIHG a wide range of diseases are being studied with

targeted and exome sequencing

• Alzheimer disease

• Amyotrophic lateral sclerosis

• Age-related macula degeneration

• Autism

• Club foot

• Charcot-Marie-Tooth disease

• Deafness

• Essential tremor

• Dilated cardiomyopathy

• Hereditary spastic paraplegia

• HIV

• Multiple sclerosis

• Parkinson disease

• Variety of recessive syndromes

• …

Page 35: John P. Hussman Institute for Human Genomics University of

HIHG faculty are actively publishing in the exome field since late 2009

• Hedges D, et al. (2009) Exome sequencing of a multigenerational human pedigree. PloS One.

• Martin ER et al. (2010) SeqEM: an adaptive genotype-calling approach for next-generation sequencing

studies. Bioinformatics.

• Sirmaci A et al. (2010). MASP1 mutations in patients with facial, umbilical, coccygeal, and auditory findings of

Carnevale, Malpuech, OSA, and Michels syndromes. Am J Hum Genet.

• Montenegro G et al. (2011) Exome sequencing allows for rapid gene identification in a Charcot-Marie-Tooth

disease family. Annals of Neurology.

• Norton N et al. (2011) Genome-wide Studies of Copy Number Variation and Exome Sequencing Identify Rare

Variants in BAG3 as a Cause of Dilated Cardiomyopathy. Am J Hum Genet.

• Züchner S et al (2011) Whole-exome sequencing links a variant in DHDDS to retinitis pigmentosa. American

Journal of Human Genetics.

• Hedges DJ et al. (2011) Comparison of three targeted enrichment strategies on the SOLiD sequencing

platform” PloS One.

• …

Page 36: John P. Hussman Institute for Human Genomics University of

Exome and targeted sequencing is a mature research tool.

Cost-effective: < US $2,000 today; ~$1,000 by end 2011

It allows entry into Human Genomics with all its complications of data analysis

and interpretation.

Summary

Is targeted sequencing here to stay (vs whole genome seq)?

• Probably as long as the economics are attractive.

• And as long as new discoveries are indeed possible.