high-throughput technologies for human genetics
Post on 28-Jan-2017
217 Views
Preview:
TRANSCRIPT
© 2010 Illumina, Inc. All rights reserved. Illumina, illuminaDx, Solexa, Making Sense Out of Life, Oligator, Sentrix, GoldenGate, GoldenGate Indexing, DASL, BeadArray, Array of Arrays, Infinium, BeadXpress, VeraCode, IntelliHyb, iSelect, CSPro, and GenomeStudio are registered trademarks or trademarks of Illumina, Inc. All other brands and names contained herein are the property of their respective owners.
High-Throughput Technologies for Human Genetics
Carsten Rosenow, PhD Associate Director Global Market
Development
HKU May 22st, 2012
2
Our Vision Innovating for the Future of Genetic Analysis
To be the leading provider of integrated solutions that advance the understanding of genetics and health
To Targeted Validation and Beyond…
From Genome Wide Discovery…
3
Next-Gen Sequencing
Custom Genotyping Arrays and Sequencing
Targeted resequencing
Next-gen GWAS Arrays
Sequencing and Arrays Leading the Next Wave of Discoveries
Array Design
4
Continuum of Human Genetics Discovery From bench to bedside
Variant Identification: Cataloging the
complete picture of variation in humans
Whole-genome Applications: Agnostic scan of
the entire genome to narrow down
regions of interest 3 Billion BP à
a few MB
Targeted Applications: Identification of causal loci in
regions of association or
candidate genes A few MB à a few specific variants
Prioritization and Functional
Understanding: Combining
orthogonal data (expression, protein, drug
response, etc.)
Translational Genomics:
Moving research findings from the RUO space into
the clinic: assessing utility
and patient impact, cost
Diagnostics: Routine use of
genetic information in the
diagnosis, treatment and prevention of
disease
5
Whole-Genome Applications
6
Published Genome-Wide Associations through 3/2012, ˜700 published GWA at p < 5 x 10-8
NHGRI GWA Catalog www.genome.gov/GWAStudies
7
Complex Diseases The GWAS Area
GWAS Publications, 2005 ‒ 6/2011
Tota
l Pub
licat
ions
951
Calendar Quarter
8
9
Understanding Heritability
Sample Size
Rarer Variants
Epistasis Environment
Copy Number
Heritability Measures Understanding
Biology
Novel Variants
? Epigenetics
?
?
10
Despite thousands of discoveries by GWAS, for any given disease, only a portion of the heritability has been explained.
Heritability Remains Nebulous
0%
20%
40%
60%
80%
100%
Huntingd
on's
Cystic
fibro
sis AMD
Crohn
'sLu
pus T2D
HDL cho
lester
ol
Height
Early M
I
Fasti
ng glc
Rare Common Disease\Traits
Adapted from Manolio et al 2009
Nature, 2008
Explained Heritability
Missing Heritability
11
Sample Size
Rarer Variants
Epistasis Environment
Copy Number
Heritability Measures
Novel Variants
? Epigenetics
?
?
Understanding Heritability
Understanding Biology?
12
Primary GWAS: 20,000 samples ~30 loci 50% implicated in plasma lipid Meta Analysis: 100,000 samples* 95 loci, 59 novel with links to plasma lipid *Derived from 46 GWAS studies from 137-22041 Individuals
Teslovich et al, (2010) Nature
Advances in Understanding Blood Lipids
13
Functional Validation shows Biological Relevance All 95 Loci Identified in This Study Will Be Subject to Further Investigation
“We expect that future investigations of the new loci (for example, resequencing efforts to identify low-frequency and rare variants, or functional experiments in cells and animal models) will uncover additional important new genes.”
Targeted Resequencing
Functional Validation
eQTL Analysis
Functional Validation
14
Identified a common, non-coding causal variant that creates a transcription binding site and alters expression of the SORT1 gene.
From Noncoding Variant to Phenotype Via SORT1 Musunuru et al, (2010) Nature
Tag SNP (P=1 x 10-170)
Expression in Liver
SORT1
CELSR2 PSRC1 MYBPHL SORT1
SNP Creates Enhancer Binding Site
Mobility shift assay
As noncoding DNA variants may alter gene expression, we previously used expression quantitative trait locus (eQTL) analyses to explore whether 1p13 SNPs are cis-acting regulators of nearby genes in human liver
eQTL analysis
15
Overexpression of Sort1 à 46% decrease in cholesterol Knockdown of Sort1 à 70% increase in cholesterol
Sort1 Viral overexpression
RNAi knockdown
The Sort1 gene alters cholesterol levels in mice Musunuru et al, (2010) Nature
16
Evolving Genomic Tools To explore the entire allele spectrum
Effe
ct s
ize
Sm
all
Larg
e
Allele Frequency Low High
Very Rare Variants Large Effect Size
Rare/Intermediate Variants
Common Variants
Small Effect Size
(Common Variants Large Effect Size)
(Rare Variants Minimal Effect Size)
LIN
KA
GE
N
GS
GW
AS
A
RR
AYS
17
Illumina’s Whole-Genome Portfolio Explore the allelic spectrum with the Omni Family of Microarrays and WGS
Product Omni Express Omni2.5 Omni5 WGS: 5x WGS: 30x
Description
Highest-throughput common variant
array with industry-proven quality at an exceptional price.
Mid-level microarray with coverage of both common and rare
SNP content
Flagship microarray with industry leading
coverage of common and rare
variation
Low-depth NGS dataset that leverages
informatics to fill in missing data
Ultimate whole-genome dataset
Samples per array / flowcell 12 8 4 ~18 ~3
Markers ~730K + up to 200K semi-custom
~2.3M + up to 200K semi-custom
~4.3M + up to 500K semi-custom
~20M w/ imputation ~2.85 Billion
Target MAF >5% >2.5% >1% Varies Varies Time to run 1000
samples* 1 week 1 week 3 weeks 9 months 4.5 years
*Assumes 1 scanner/sequencer and standard automation
18
! Intelligently selected Tag SNPs provide industry best coverage of Chinese populations
– Higher coverage for Chinese populations compared to OmniExpress & competing Chinese whole-genome array
! Proven Infinium Assay – Industry leading call-rates and reproducibility
! 8x1 Sample format provides exceptional throughput for large-scale GWAS studies
– Up 960 spls/week with our standard configuration § 1 iScan, 1 Autoloader, 1 tecan, 5 day week
The Human OmniZhongHua-8 The First Chinese Population-Specific Whole Genome Array from Illumina
The ZhongHua delivers coverage of both Hapmap and Thousand genomes variants, & exceptional throughput at an enabling price.
19
The HumanExome BeadPool Leveraging microarrays for validation, to boost power
! The HumanExome BeadChip – Collaborative effort among genetics key
opinion leaders – A tool for assaying function exonic SNPs
in large sample sets to drive statistical power
Over 1 Million Samples Sold
20
Exome SNPs Are Distributed Across Gene Regions
Feature Type Number of Markers
Total Markers ≈250K
SNPs within RefSeq ≈248K
SNPs in coding regions 246K
SNPs within 10Kb of RefSeq 260K
Non-Synonymous 232K
Promoter 7K
SNPs in splice sites 67K
Other Regions: Ancestry; IBD; GWAS; ADME/MHC; X/Y/Mito; Indels
5’ UTR 3’ UTR exon
Access ˜260K exonic markers from diverse populations
intron 10kb US
10kb DS
Splice site
Promoter
21
Targeted Whole Genome Variant Discovery / Targeted
Exome BeadChip OmniExpress Exome Omni2.5 Exome Omni5 Exome TruSeq
Exome Seq
The most comprehensive set of functional exonic variants for quick,
economical screening
The most economical common variant GWAS with enhanced coverage
of the exome
Whole-genome coverage down to
2.5% MAF with enhanced coverage
of the exome
The ultimate whole-genome array with
coverage down to 1% MAF with the most
exonic coverage of any microarray
Industry leading NG exome sequencing protocol – variant
discovery and targeted screening
~250K functionally relevant variants
~730K common tagSNPs + ~250K exonic variants
~2.3M tagSNPs down to 2.5% MAF
+ ~250K exonic variants
~5.3M tagSNPs down to 1% MAF + ~250K exonic variants
Capture and sequence >94% of refseq coding
exons
Illumina’s Exome Family From variant discovery to variant screening and back again Infinium Exome Content TruSeq Exome
22
Product VeraCode GoldenGate iSelectHD 3k-90k
iSelectHD 90k-250k
iSelectHD 250k-1m Semi-Custom
Product Descrip.on
The power of the GoldenGate
assay on a platform and at
a price accessible to all
labs.
Pre-optimized design of custom content using the
most trusted low-plex assay with the highest
quality data in the industry
Validate genomic
discoveries with the most robust
data and flexible content
design
Enable more genomic
discoveries with the most robust data and flexible content design
Enable more genomic
discoveries with the most robust data and flexible content design
Combine custom content with
Illumina’s GWAS or focused genotyping arrays to maximize
your association study
Samples per array 96 32 24 12 4 Varies
Markers 1 - 384 96 – 3,072 3,072 – 90K 90K – 250K 250K – 1M Varies List Price
$ $ $-$$ $$ $$ Varies
Custom Genotyping Family of Options Driving the cycle of discovery, validation and powering statistical significance
Increasing Plexity
Increasing Price
23
Illumina’s DNA Technology Portfolio Complementary Technologies Working Together for Discovery
STRENGTHS
Throughput
Price
Ease of use
Discovery technology
Capture more variation
24
RNA-Seq: Analysis of Severe Acute Respiratory Syndrome (SARS) -infected Mouse RNA Samples
The SARS Virus: SARS-CoV [Coronavirus; RNA virus; 29,727 bases ]
25
26
27
RNA-Seq: Using Counts instead of Intensity
VS
28
Even distribution of a transcript 100 Kb mRNA
Normal polyA select mRNA-seq
Total RNA
Poly-A + DSN
29
Experimental strategy
! Infect mouse with SARS Coronavirus (positive-strand, enveloped RNA virus)
! Extract RNA from lung tissue – Mixture of host and pathogen RNA
! Input 1 µg Total RNA into standard Illumina mRNA-seq assay inc. poly A – Abilities demonstrated here compatible with all Illumina RNA-seq assays
including the Total RNA-seq – Now input is as low as 100ng
! Align reads against Mouse Genome
! Align reads against SARS Genome
30
Global Mouse Gene Expression Changes
SARS infected mouse lung tissue
Uni
nfec
ted
mou
se lu
ng ti
ssue
19,013 RefSeq Genes
31
Largest Fold Change increase & decrease in mouse after SARS exposure
Genes SARS-MF1 Control-MF2 Fold-ChangeCcl2 2588 5 517.6Serpina3h 3590 7 512.8571429Rsad2 11663 34 343.0294118Cxcl10 6613 20 330.65EG667977 1762 7 251.7142857Cxcl9 1849 10 184.9Slfn4 5302 29 182.8275862Isg15 7780 54 144.0740741Oasl1 3127 26 120.2692308Oas3 3555 30 118.5Ifi202b 2422 22 110.0909091Irf7 22871 212 107.8820755Slfn8 1604 15 106.9333333Zbp1 4405 42 104.8809524Mx2 2758 27 102.1481481Mx1 4146 45 92.13333333Oas2 2485 29 85.68965517H2-Q2 1455 18 80.83333333Ifi44 5551 77 72.09090909Phf11 1067 16 66.6875H2-Q10 1360 22 61.81818182Ifit1 10147 179 56.68715084Apod 7419 137 54.15328467Oas1a 1567 30 52.23333333Mnda 412 8 51.5
Genes SARS-MF1 Control-MF2 RatioCar3 16 1878 0.00852Igfbp2 15 1268 0.01183Psca 5 366 0.013661Cfd 29 1818 0.015952Cntn1 8 471 0.016985Scn3a 8 359 0.022284Gdpd3 32 1370 0.023358Ptma 12 409 0.02934H2-Eb1 109 3204 0.03402Rps18 8 228 0.035088Lrrc17 5 141 0.035461Cyp4f16 8 218 0.036697Aox3 70 1764 0.039683Cidec 25 577 0.043328Cidea 7 161 0.043478Cyp1a1 89 2046 0.0435Tnn 19 426 0.044601Glp1r 15 325 0.046154A930038C07Rik 55 1179 0.04665D430041D05Rik 8 171 0.046784Thrsp 39 819 0.047619Adrb3 15 306 0.04902Mamdc2 98 1988 0.049296Ociad2 8 162 0.049383Pcolce2 178 3604 0.04939
Increased Expression Decreased Expression
Chemokine (C-C motif) ligand 2 (CCL2): recruits memory T-Cells to sites of infection
32
Transcripts that are not detected in one sample or the other…
Genes SARS-MF1 Control-MF2H2-Ea 1036 0Cyp2b13 168 0Olfr56 145 0Cngb3 100 0Csf3 96 0Saa2 70 0Ccl20 53 0Klri2 46 0Rhcg 42 0Ubd 40 0Trim12 33 0LOC547349 32 04930503B20Rik 24 0BC049730 24 0Il27 22 0Trim69 21 01700009N14Rik 20 0Klri1 20 0
Genes SARS-MF1 Control-MF2Ucp1 0 276Ear2 0 181Tmem45b 0 102Klk1b22 0 85Fabp1 0 82Glrb 0 50Ces5 0 49Slc15a1 0 44Tmem16e 0 38Chrm1 0 37Hes2 0 37Hepacam2 0 35Kcna1 0 35Rbp7 0 34Clca3 0 33Prss35 0 32St6galnac1 0 27Tpsb2 0 25
Increased Expression Decreased Expression
H2-Ea: major histocompatibility complex, class II, high homology in humans: candidate susceptibility gene for pulmonary fibrosis
33
GenomeStudio View of IL8RB Expression Changes
547 Counts
30 Counts
+ SARS
34
GenomeStudio View of RSAD2 Expression Changes
11,663 Counts
34 Counts
+ SARS
35
Coverage of reads on SARS Genome in MF1 and MF2
MF2, 9 reads
MF1 ~250K reads
a. Total length of the SARS sequence.
b. First half of the sequence
MF1
+ SARS
36
SNPs called by CASAVA and displayed in GenomeStudio in SARS
12 SNPs
37
An Instrument for Every Need. Every Budget. Every Lab
Two proven technologies. One powerful platform.
HiScanSQ
The most widely cited platform, now at
half the price
GAIIx MiSeq
My Samples. My Study. MiSeq
Powerful. Flexible. Scalable.
HiSeq 1000/1500
Redefining the trajectory of sequencing.
HiSeq 2000/2500
38
A Solution for Every Application Illumina Sequencing Platform
Application Optimal Platform MiSeq HiSeq
Targeted Resequencing
Amplicons (Nextera, Tailed PCR) √√√ TruSeq Custom Amplicon (up to 100 kb / 50 genes) √√√ TruSeq Custom Enrichment (up to 2 Mb / 500 genes) √√√ √√√ TruSeq Custom Enrichment (up to 20 Mb / 5K genes) √√√ TruSeq Exome (62 Mb; >20 K genes) √√√
RNA-Seq
Small RNA √√√ √√ Microbial RNA-Seq √√√ √√ Human RNA-Seq √ √√√
Whole Genome Resequencing De Novo Sequencing
Large complex genomes (eg. Human) √√√ Microbial genomes √√ √√√
Regulation ChIP-Seq √ √√√ WG Methylation √√√
39
TruSeq Exome Enrichment Targets = 100,000s
TruSeq Custom Enrichment Targets = 1000s
TruSeq Custom Amplicon Targets = 100s
TruSeq Targeted Resequencing The simplest and most scalable targeted resequencing solutions
Nextera PCR Amplicons Targets = 10s
40
TruSeq Custom Amplicon The fastest and easiest multiplexed amplicon assay optimized for MiSeq
! Rapid & economical – Up to 384 amplicons per sample – Up to 96 samples per plate – Plate-based processing – <8 hrs from gDNA to sequencing-ready
library – Utilizes standard lab equipment – No quant needed before sequencing
! Fully customized target probes & capture – DesignStudio for interactive design and
ordering – Personalized and easy to use – Proven extension and ligation-based assay – Rapid design turnaround
! Pre-configured, automated data analysis
41
BaseSpace The Best Place to Store Your NGS Data
Reads and qualities Sample and experiment descriptions Analysis results
variants contigs metagenomes coverage statistics miRNA counts more…
! Eliminates need for onsite storage and compute
! Results available anywhere, anytime
! Browse the results via web-based graphical environment
! Access to a growing suite of analysis tools
! Tools for collaboration and sharing
42
Targeted Cancer Sequencing Lipson et al. Nature Genetics 2012
! Targeted cancer panel
! Deep sequencing
! FFPE samples
! Fusion genes detected by exome sequencing
! Informa.ve graphics
Nat Med, 2012 e-pub ahead of print
43
Experimental Design Overview
! Panel of 145 cancer-‐relevant genes with 2574 coding exons
! 40 colorectal cancer and 24 non–small cell lung cancer
! DNA isolated from 40 microns of formalin fixed paraffin embedded (FFPE) tumor.
! For all specimens ≥25% of the nuclear area was malignant tumor cells so no micro/macro dissec.on .ssue enrichment was performed.
! Sequencing on the HiSeq2000 instrument (Illumina) was with 36 bp paired reads to average depth of 229X
! Base subs.tu.on: >10% mutant allele frequency with >99% sensi.vity
! Indels: >20% mutant allele frequency with >95% sensi.vity
! False discovery rate <1%
! Found at least one clinically relevant genomic altera.on in 59% of the samples
44
40 CRC FFPE specimens
45
24 NSCLC FFPE specimens
46
Validation of the RET fusion genes
! Further screening of 561 lung adenocarcinomas identified 11 additional tumors with KIF5B-RET gene fusions
mRNA-‐Seq
47
TruSeq® Amplicon – Cancer Panel Hundreds of loci. Rapid prep. FFPE-ready.
! Comprehensive Content – >35 kb total including oncogenes
such as BRAF, KRAS & EGFR – 212 amplicons in one tube; 48 genes
! Unrivaled Multiplexing – Up to 96 sample pooling on MiSeq – >90% specificity and uniformity – Detect low frequency variants (<5%)
! Unparalleled Workflow – FFPE-enabled with sample QC Kit – No qPCR quant needed for
normalization – Automated paired end sequencing
with MiSeq – Pre-configured, automated data
analysis For research use only
ABL1 EGFR GNAS MLH1 RET AKT1 ERBB2 HNF1A MPL SMAD4 ALK ERBB4 HRAS NOTCH1 SMARCB1 APC FBXW7 IDH1 NPM1 SMO ATM FGFR1 JAK2 NRAS SRC BRAF FGFR2 JAK3 PDGFRA STK11 CDH1 FGFR3 KDR PIK3CA TP53
CDKN2A FLT3 KIT PTEN VHL CSF1R GNA11 KRAS PTPN11
CTNNB1 GNAQ MET RB1
48
Where is the Missing Heritability?
Sample Size
Rarer Variants
Epistasis Environment
Copy Number
Heritability Measures What is the biology?
Novel Variants
? Epigenetics
?
?
49
Interest in Epigenetics Is On The Rise
50
The Infinium HumanMethylation450 includes every content category requested by an expert consortium
Feature Included on array
Total # sites 485,553
RefSeq genes 99%
CpG islands 96%
island shores 92%
island shelves 86%
HMM islands >63K
FANTOM 4 promoters >12K
Informatically-predicted enhancers >102K
DNAse hypersensitive sites >60K
MHC sites >12K
Non-CpG loci >3K
51
Methylation Sites Are Distributed Across Gene Regions
Feature Type Genes Mapped
Percent Genes Covered
Number of Loci on Array
NM_TSS200 14895 0.79 2.56
NM_TS1500 17820 0.94 3.41
NM_5'UTR 13865 0.78 3.34
NM_1stExon 15127 0.80 1.62
NM_3'UTR 13042 0.72 1.02
NM_GeneBody 17071 0.97 8.97
5’ UTR 3’ UTR TSS1500 TSS200 1st exon Gene body
Overall global average of 17 sites / RefSeq gene region
52
CpG Islands, Shores Are Covered With Similar Strategy
Feature Type Islands Mapped Percent Islands Covered
Average Number of Loci on Array
Island 26153 0.94 5.08 N_Shore 25770 0.93 2.74 S_Shore 25614 0.92 2.66 N_Shelf 23896 0.86 1.97 S_Shelf 23968 0.86 1.94
N Shelf N Shore S Shore S Shelf CpG Island
53
Compared methylation profiles of HTC-116 colorectal cancer cell line with normal colon mucosa
Identified distributions of hypermethylated (left) vs. hypomethylated (right) loci across region categories (tumor vs. normal)
Demonstrates ability of HumanMethylation450 to detect differential methylation across gene, CpG island regions
Recent study validates importance of HM450 gene region coverage, suitability for EWAS
54
! 12 samples / array format
! Scan time 1 hr / BeadChip
! Manual or automated workflow
! Process up to 96 samples in parallel
! 4 days from DNA to data
! LIMS (Laboratory Information Management System) now available
– Enables positive sample tracking – Quality assurance in large sample size
studies
High-Throughput For Large Sample Size Study Designs
55
CNV Detection: SNPs provide maximal CNV information Applicable for all Infinium Arrays
Normal (diploid)
Deletion (loss of one copy)
Duplication (gain of one copy)
Intensity Genotypes
56
“Now this is not the end. It is not even the beginning of the end. But it is,
perhaps, the end of the beginning.”
-- Winston Churchill
top related