analysis of human exome sequencing data: the «other» information anno accademico 2014-2015 corso...

18
ANALYSIS OF HUMAN EXOME SEQUENCING DATA: THE «OTHER» INFORMATION Anno accademico 2014- 2015 CORSO DI LAUREA MAGISTRALE IN BIOTECNOLOGIE GENOMICHE, INDUSTRIALI E AMBIENTALI TESI SPERIMENTALE IN BIOINFORMATICA Relatore interno: Prof. Stefano Pascarella Candidata: Agnese Giovannetti Progetto in collaborazione con: - Dott. Tartaglia, rep. “Fisiopatologia delle malattie genetiche”, Dip. EOMM, ISS - Dott. ssa Viviana Caputo, Dip. di Medicina Sperimentale, Sapienza Università di Roma Relatore esterno: Dott. ssa Viviana Caputo Matricola: 1397519

Upload: frank-mosley

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

ANALYSIS OF HUMAN EXOME SEQUENCING DATA: THE «OTHER» INFORMATION

Anno accademico 2014-2015

CORSO DI LAUREA MAGISTRALE IN BIOTECNOLOGIE GENOMICHE, INDUSTRIALI E AMBIENTALI

TESI SPERIMENTALE IN BIOINFORMATICA

Relatore interno:Prof. Stefano Pascarella

Candidata:Agnese Giovannetti

Progetto in collaborazione con: - Dott. Tartaglia, rep. “Fisiopatologia dellemalattie genetiche”, Dip. EOMM, ISS

- Dott. ssa Viviana Caputo, Dip. di Medicina Sperimentale, Sapienza Università di Roma

Relatore esterno:Dott. ssa Viviana Caputo

Matricola: 1397519

Next Generation Sequencing

• Whole Genome Sequencing (WGS)

• Whole Exome Sequencing (WES)Identification of novel diseases-causing genes

Bioinformatic workflow Experimental workflow

Whole Exome Sequencing workflow

ABSENT IN UNAFFECTED POPULATION CONTROLS

SEGREGATION ANALYSIS(pedigree information, linkage data)

Discrete filtering

Discrete filtering

Discrete filtering

PrioritizationPrioritization

DNA VARIANTS

~200 - 400

~20 – 30,000

NON-SYNONYMOUS, NONSENSE, FRAME-SHIFT, SPLICE SITE

Discrete filtering

FUNCTIONAL ANNOTATION

A PRIORI KNOWLEDGE

Variants filtering and prioritization

The lost treasures in exome sequencing data

Regions of

Copy Number Variations

Intronic/UTR variantsSynonymous

Genomic analysesSilent

Non-coding

miRNA variants LincRNA variants

Mitochondrial DNA

variants

Homozygosity

Aim of the thesis

Regions of Synonymous

Genomic analyses Silent Non-codingmiRNA variants

variantsHomozygosity

• A cohort of 47 patients affected by different genetic diseases, including neurological syndromes, developmental diseases and complex phenotypes

Can we use WES data to retrieve «other» information?

• Bioinformatic approach: use of already existing tools and development of novel algorithms

Regions of Homozygosity (ROHs)

• A ROH denotes a consecutive series of genotypes in homozygosis in a genomic trait

• Homozygosity mapping in monogenic recessive disorders

• Ancestral homozygosity (founder effect), parental consanguinity, uniparental disomy, deletions

Regions of Homozygosity (ROHs) - Results

Determination of homozygosity blocks (80 consecutive variants) using

HomozygosityMapper (http://www.homozygositymapper.org/)

Distribution of ROH classesCorrelation between

cumulative ROHs and parental relationship

First cousins

Double first-cousins

Siblings

- Background ancestralhomozygosity?- Uniparental disomy?- Deletions?

Synonymous variants

• splice sites • exonic splicing enhancer/silencer• mRNA structure

Synonymous variants can affect:Human diseases associated with synonymous variants

Synonymous variants - ResultsSynonymous variants

distributionSNPs distribution

(dbSNP142)Genotype distribution

Synonymous variantsOther variants

SNPno SNP

Homozygosis

Heterozygosis

Compound Heterozygosis

X-liked

CLINICAL ASSOCIATIONSYNONYMOUS VARIANTS

PRIORITIZATION

Human Gene Mutation Database (HGMD): a comprehensive collection of germline

mutations that underlie, or are associated with, human inherited diseases

http://www.hgmd.cf.ac.uk/ac/index.php

DELETERIOUSNESS

CLINICAL ASSOCIATION - Results

Synonymous variants - Results

• DM: disease-causing mutations reported in literature • DP: disease-associated polymorphisms, reported to be in

significant association with disease • DFP: disease-associated polymorphisms with supporting

functional evidence• FP: in vitro or in vivo functional polymorphisms

• 0.2% variants• 41 total variants

http://compbio.cs.toronto.edu/silva/

• Conservation• Codon usage bias• Sequence features• Exon splice enhancer/suppressor motifs• Splice site motifs• Pre-mRNA folding free energy

Silent Variant Analyzer is a tool for the automated harmfulness prediction of synonymous variants within the human genome.

DELETERIOUSNESS - Results

Synonymous variants - Results

miRNA Variants

microRNAs in captured target WES regions

MicroRNAs are small non-coding RNA molecule (about 22 nucleotides) found in plants, animals, and some viruses, which functions in RNA silencing and post-transcriptional regulation of gene expression.

miRNA biogenesis

Agilent

RocheIllumina

Annomir a script in Python to annotate miRNA variants:

miRNA Variants - Results

• Accession Number• Name• Structure

Homo sapiens (1881 precursors, 2588 mature)miRBase

http://www.mirbase.org/

Pre-miRMature miR

• ~70 variants • Median depth: 59X• ~97% in dbSNP

mature miRpre-miR

Clinical annotation (HGMD)

No tool to prioritize miRNA variants!

miRNA Variants - Results

Ongoing Analyses

+/- 2bp+/- 5bp > 5bp

5’UTR 3’UTR

Splice site/Branching point/Intron Copy Number Variations

Mitochondrial DNA

• Analyses of genome profile through WES data allowed to identify long regions of homozygosity that could suggest parental consanguinity, background ancestral homozygosity, deletions, and uniparental disomy.

• Study of silent and miRNA variants demonstrated that WES data could provide reliable calling of these types of variations that, in some cases, could be clinically associated.

• This analysis confirmed that NGS approaches are powerful strategies to characterize human DNA variation.

Conclusions

SILENT AND NON-CODING VARIANTS

ABSENT IN UNAFFECTED POPULATION CONTROLS

SEGREGATION ANALYSIS(pedigree information, linkage

data)

ABSENT IN UNAFFECTED POPULATION CONTROLS

SEGREGATION ANALYSIS(pedigree information, linkage data)

PRIORITIZATION

NON-SYNONYMOUS, NONSENSE, FRAME-SHIFT, SPLICE SITE

FUNCTIONAL ANNOTATION FUNCTIONAL ANNOTATION

GENOMIC ANALYSES

GENOTYPE INFORMATION (ROHs)

VARIANT DEPTH INFORMATION

(CNVs)

DNA VARIANTS

Integrated workflow

Thank you for your attention!