analysis of human exome sequencing data: the «other» information anno accademico 2014-2015 corso...
TRANSCRIPT
ANALYSIS OF HUMAN EXOME SEQUENCING DATA: THE «OTHER» INFORMATION
Anno accademico 2014-2015
CORSO DI LAUREA MAGISTRALE IN BIOTECNOLOGIE GENOMICHE, INDUSTRIALI E AMBIENTALI
TESI SPERIMENTALE IN BIOINFORMATICA
Relatore interno:Prof. Stefano Pascarella
Candidata:Agnese Giovannetti
Progetto in collaborazione con: - Dott. Tartaglia, rep. “Fisiopatologia dellemalattie genetiche”, Dip. EOMM, ISS
- Dott. ssa Viviana Caputo, Dip. di Medicina Sperimentale, Sapienza Università di Roma
Relatore esterno:Dott. ssa Viviana Caputo
Matricola: 1397519
Next Generation Sequencing
• Whole Genome Sequencing (WGS)
• Whole Exome Sequencing (WES)Identification of novel diseases-causing genes
ABSENT IN UNAFFECTED POPULATION CONTROLS
SEGREGATION ANALYSIS(pedigree information, linkage data)
Discrete filtering
Discrete filtering
Discrete filtering
PrioritizationPrioritization
DNA VARIANTS
~200 - 400
~20 – 30,000
NON-SYNONYMOUS, NONSENSE, FRAME-SHIFT, SPLICE SITE
Discrete filtering
FUNCTIONAL ANNOTATION
A PRIORI KNOWLEDGE
Variants filtering and prioritization
The lost treasures in exome sequencing data
Regions of
Copy Number Variations
Intronic/UTR variantsSynonymous
Genomic analysesSilent
Non-coding
miRNA variants LincRNA variants
Mitochondrial DNA
variants
Homozygosity
Aim of the thesis
Regions of Synonymous
Genomic analyses Silent Non-codingmiRNA variants
variantsHomozygosity
• A cohort of 47 patients affected by different genetic diseases, including neurological syndromes, developmental diseases and complex phenotypes
Can we use WES data to retrieve «other» information?
• Bioinformatic approach: use of already existing tools and development of novel algorithms
Regions of Homozygosity (ROHs)
• A ROH denotes a consecutive series of genotypes in homozygosis in a genomic trait
• Homozygosity mapping in monogenic recessive disorders
• Ancestral homozygosity (founder effect), parental consanguinity, uniparental disomy, deletions
Regions of Homozygosity (ROHs) - Results
Determination of homozygosity blocks (80 consecutive variants) using
HomozygosityMapper (http://www.homozygositymapper.org/)
Distribution of ROH classesCorrelation between
cumulative ROHs and parental relationship
First cousins
Double first-cousins
Siblings
- Background ancestralhomozygosity?- Uniparental disomy?- Deletions?
Synonymous variants
• splice sites • exonic splicing enhancer/silencer• mRNA structure
Synonymous variants can affect:Human diseases associated with synonymous variants
Synonymous variants - ResultsSynonymous variants
distributionSNPs distribution
(dbSNP142)Genotype distribution
Synonymous variantsOther variants
SNPno SNP
Homozygosis
Heterozygosis
Compound Heterozygosis
X-liked
CLINICAL ASSOCIATIONSYNONYMOUS VARIANTS
PRIORITIZATION
Human Gene Mutation Database (HGMD): a comprehensive collection of germline
mutations that underlie, or are associated with, human inherited diseases
http://www.hgmd.cf.ac.uk/ac/index.php
DELETERIOUSNESS
CLINICAL ASSOCIATION - Results
Synonymous variants - Results
• DM: disease-causing mutations reported in literature • DP: disease-associated polymorphisms, reported to be in
significant association with disease • DFP: disease-associated polymorphisms with supporting
functional evidence• FP: in vitro or in vivo functional polymorphisms
• 0.2% variants• 41 total variants
http://compbio.cs.toronto.edu/silva/
• Conservation• Codon usage bias• Sequence features• Exon splice enhancer/suppressor motifs• Splice site motifs• Pre-mRNA folding free energy
Silent Variant Analyzer is a tool for the automated harmfulness prediction of synonymous variants within the human genome.
DELETERIOUSNESS - Results
Synonymous variants - Results
miRNA Variants
microRNAs in captured target WES regions
MicroRNAs are small non-coding RNA molecule (about 22 nucleotides) found in plants, animals, and some viruses, which functions in RNA silencing and post-transcriptional regulation of gene expression.
miRNA biogenesis
Agilent
RocheIllumina
Annomir a script in Python to annotate miRNA variants:
miRNA Variants - Results
• Accession Number• Name• Structure
Homo sapiens (1881 precursors, 2588 mature)miRBase
http://www.mirbase.org/
Pre-miRMature miR
• ~70 variants • Median depth: 59X• ~97% in dbSNP
mature miRpre-miR
Ongoing Analyses
+/- 2bp+/- 5bp > 5bp
5’UTR 3’UTR
Splice site/Branching point/Intron Copy Number Variations
Mitochondrial DNA
• Analyses of genome profile through WES data allowed to identify long regions of homozygosity that could suggest parental consanguinity, background ancestral homozygosity, deletions, and uniparental disomy.
• Study of silent and miRNA variants demonstrated that WES data could provide reliable calling of these types of variations that, in some cases, could be clinically associated.
• This analysis confirmed that NGS approaches are powerful strategies to characterize human DNA variation.
Conclusions
SILENT AND NON-CODING VARIANTS
ABSENT IN UNAFFECTED POPULATION CONTROLS
SEGREGATION ANALYSIS(pedigree information, linkage
data)
ABSENT IN UNAFFECTED POPULATION CONTROLS
SEGREGATION ANALYSIS(pedigree information, linkage data)
PRIORITIZATION
NON-SYNONYMOUS, NONSENSE, FRAME-SHIFT, SPLICE SITE
FUNCTIONAL ANNOTATION FUNCTIONAL ANNOTATION
GENOMIC ANALYSES
GENOTYPE INFORMATION (ROHs)
VARIANT DEPTH INFORMATION
(CNVs)
DNA VARIANTS
Integrated workflow