vaast: deciphering genetic disease with next-generation sequencing

37
VAAST Deciphering Genetic Disease with Next- Generation Sequencing Barry Moore, M.S. Research Scientist Department of Human Genetics Department of Biomedical Informatics

Upload: barrymoore

Post on 10-May-2015

880 views

Category:

Science


3 download

TRANSCRIPT

Page 1: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAASTDeciphering Genetic Disease with Next-Generation Sequencing

Barry Moore, M.S.Research ScientistDepartment of Human GeneticsDepartment of Biomedical Informatics

Page 2: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Outline

The VAAST Analysis Pipeline

Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause

The Future of VAAST Development

Page 3: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

$10,000,000Venter Genome

$1,000,000Watson

$5,000You?

Page 4: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

geneA geneB geneX geneY geneZ

Disease

Healthy

Next Generation Sequencing

Page 5: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Variant Annotation

Variant Selection

Variant Analysis

Variant

Annotation

Tool

Variant

Selection

Tool

Variant

Annotation

Analysis

Search

Tool

Page 6: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

GVF

VAT(Variant Annotation Tool)

VST(Variant Selection Tool)

Reference Genome

Annotated Variants

Merged Variant Sets

Reference Genes

VAAST Pipeline

Annotated Variants

Annotated Variants

3.5 Million Variants

Fasta GFF3

GVF

CDR

Page 7: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

GVF

VAT(Variant Annotation Tool)

VST(Variant Selection Tool)

Reference Genome

Annotated Variants

Merged Variant Sets

Reference Genes

VAAST Pipeline

Annotated Variants

Annotated Variants

3.5 Million Variants

Fasta GFF3

GVF

CDR

Variant Type•sequence_alteration•deletion•insertion•duplication•inversion•substitution•SNV•MNP•complex substitution•translocation

Variant Effect•sequence_variant•gene_variant•five_prime_UTR_variant•three_prime_UTR_variant•exon_variant•splice_region_variant•splice_donor_variant•splice_acceptor_variant•intron_variant•coding_sequence_variant•stop_retained•stop_lost•stop_gained•synonymous_codon•non_synonymous_codon•amino_acid_substitution•frameshift_variant•inframe_variant

Page 8: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

GVF

VAT(Variant Annotation Tool)

VST(Variant Selection Tool)

Reference Genome

Annotated Variants

Merged Variant Sets

Reference Genes

VAAST Pipeline

Annotated Variants

Annotated Variants

3.5 Million Variants

Fasta GFF3

GVF

CDR

Variant Type•sequence_alteration•deletion•insertion•duplication•inversion•substitution•SNV•MNP•complex substitution•translocation

Variant Effect•sequence_variant•gene_variant•five_prime_UTR_variant•three_prime_UTR_variant•exon_variant•splice_region_variant•splice_donor_variant•splice_acceptor_variant•intron_variant•coding_sequence_variant•stop_retained•stop_lost•stop_gained•synonymous_codon•non_synonymous_codon•amino_acid_substitution•frameshift_variant•inframe_variant

Page 9: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST

Prioritized Candidate

Genes

Background Genomes

Target Genomes

CDR CDR

VAAST Report

Page 10: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

• Probabilistic

• Feature Based

• Both Allele and AAS Frequencies

• Considers Inheritance Model

• Fast

• Standardized Ontology Based Format

• Modular and Flexible in Design

Key Features of VAAST

Page 11: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST Uses Variant Frequencies in a Probabilistic Fashion

Likelihood Ratio Test

Maximum Likelihoodof the Null Model(No Difference)

Maximum Likelihoodof the Alternate Model(There is Difference)

Page 12: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST Uses Variant Frequencies in a Probabilistic Fashion

Page 13: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST Uses Variant Frequencies in a Probabilistic Fashion• VAAST gives us the likelihood of the composite genotype

at GENE X in the target given the background.

• Do allele frequencies differ between Background and Target genomes within a given gene or feature?

• Composite likelihood calculation assumes independence across sites. To control for LD, statistical significance is estimated by permutation test.

• Multiple test correction for number of features (~20,000) is two orders of magnitude better than for the number of variants (~3,500,000).

Page 14: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

1 genome target1 genome background

Noise Decreases Dramatically with Increasing Number of Genomes

Page 15: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

1 genome target10 genome background

Page 16: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

1 genome target250 genome background

Page 17: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

1 genome target250 genome background

Trio Data

Page 18: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing
Page 19: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

G:RG:A

G:A

G:R

G:A

G:R

Mom Dad

R:Q

R:Q R:Q

R:*

CHR 16: DHODH

CHR 5: DNAH5

•Ng et al, Nature Genetics 42, 30–35 (2010) doi:10.1038/ng.499•Roach, et al, Science , 328 636, 2101

Alleles Responsible for Miller Syndrome in Utah Kindred

R:*

R:*

Mom Dad

Son Daughter Son Daughter

Page 20: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

DNAH5

DHODH

Schematic of VAAST Analysis of Utah Miller Kindred Using a Single Quartet

Page 21: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

DOMINANT RECESSIVE

-500

-400

-300

-200

-100

0

100

200156

132

2189 3

Ave

. ra

nk g

en

om

e-w

ide

2 allele copies

4 allele copies

6 allele copies

SIZE OF CASE COHORT

443 genomes in background

Average Rank for 100 Dominant and Recessive Diseases

Page 22: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

DOMINANT RECESSIVE

-500

-300

-100

100

300

500

700639

373

61

219 3

Ave

. ra

nk g

enom

e-w

ide 2 of 6 allele copies

4 of 6 allele copies

6 of 6 allele copies

443 genomes in background

Impact of Missing Data

Page 23: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Outline

The VAAST Analysis Pipeline

Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause

The Future of VAAST Development

Page 24: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

An Rare X-linked Mendelian Disorder

• A Utah family coming to the University Hospital for 20+ years

• About half of the male offspring die around 1 year of age

• Aged appearance

• Craniofacial anomalies

• Hypotonia

• Global developmental delays

• Cardiac arrhythmias

Page 25: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Four Affected Boys over Two Generations

I

II

III

Page 26: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

• Agilent SureSelect In-Solution X Chromosome Capture

• Covaris S series Sonication (150-200 bp)

• 76 bp single-end reads on one lane each of the Illumina GAIIx

Exome Sequencing

• Sequence alignment with bwa

• Remove duplicate reads with PICARD

• Realign indel regions with GATK

• Variant calling with Samtools, GATK

Variant Calling

Page 27: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST Identifies NAA10 as Candidate Gene

• About 20 min. run time

• 3 candidate genes (NAA10 ranked 2) proband only

• 1 candidate gene (NAA10) with pedigree

Identifying Candidate Genes

Page 28: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Additional Analyses

• Microarray based CNV analysis

• No likely causal variants found

• Sanger sequencing confirmation

• Variant segregates perfectly with disease in 13 family members

• Haplotype sharing (STR genotyping)

• ~11 MB shared between two affected boys

• A second family discovered – same mutation

• IBD relatedness analysis – independent mutational events

Page 29: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

N(alpha)-acetyltransferase

• N-alpha-acetylation is one of the most common protein modifications that occurs during protein synthesis.

• NatA (catalytic subunit NAA10 (hARD1)

• Eight exons, Crick strand, highly conserved

• A:G transition causes p.Ser37Pro

Page 30: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Functional Analyses

• Quantitative in vitro N-terminal acetylation assay (RP-HPLC).

• Four peptide substrates previously shown to be acetylated by NatA (NAA10)

• Assays indicate loss-of-function allele.

Page 31: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Functional Analyses

Page 32: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing
Page 33: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

• Probabilistic Disease Gene Finder

• Feature Based not Variant Based

• Both Allele and AAS Frequencies

• Considers Inheritance Model

• As few as two target genomes can be sufficient to identify causative gene.

• Background Genomes are “Reusable”

• Not Limited to Human Analyses

VAAST in Summary

Page 34: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

VAAST: Future Directions

• Indel support

• Splice-site

• No-call support

• Pedigree support

• Phylogenetic conservation

Page 35: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing
Page 36: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

AcknowledgementsVAAST Development•Chad Huff•Hao Hu•Lynn Jorde•Barry Moore•Martin Reese•Marc Singleton•Jinchuan Xing•Mark Yandell

Ogden Syndrome•John Carey•Steven Chin•Heidi Deborah Fain•Gholson Lyon•John Optiz•Theodore J. Pysher•Alan Rope•Reid Robison•Sarah T. South

•Chad Huff•Evan Johnson•Barry Moore•Christa Schank•Kai Wang•Jinchuan Xing

Yandell Lab•Michael Campbell•Daniel Ence•Guozhen Fan•Steven Flygare•Hao Hu•Zev Kronenberg•Barry Moore•Marc Singleton•Robert Ross•Mark Yandell

•Thomas Arnesen•Rune Evjenth•Johan R. Lillehaug

•Leslie G. Biesecker•Jennifer J. Johnston•Cathy A. Stevens

•Brian Dalley•Tao Jiang•Jefferey Swensen

•Hakon Hakonarson•Lynn B. Jorde•Mark Yandell

Page 37: VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

Acknowledgements