140127 abrf interlaboratory study proposal
TRANSCRIPT
The ABRF Next Generation Sequencing Study:Multi-Platform and Cross-Methodological Reproducibility of RNA and DNA Profiling
Genome in a Bottle Consortium WorkshopJanuary 2014
Don A. Baldwin, Ph.D.CSO, Pathonomics LLC
ABRF is an international organization of over 700 scientists from shared research resource core facilities and biotechnology laboratories.
Members represent over 250 core labs in academic and research institutions, government, and industry.
“Yellow pages” and “MarketPlace” databases of members at www.ABRF.org Electronic discussion group facilitates sharing of technical advice and core facility
networking.
The Journal of Biomolecular Techniques covers genomics, proteomics, imaging, and other biotechnologies, and core facility operational management.
www.abrf.org
The ABRF Next Generation Sequencing (NGS) Study:
• Produce reference data sets to establish baseline performance• Promote the use of standard samples• Provide public access to data for self-evaluation, performance
monitoring and methods development
Phase I: RNA-Seq and degraded RNA-Seq (2011-2013)Phase II: DNA-Seq and hard-to-sequence regions (2014-2016)Phase III: Clinical genetics sequencing panelsPhase IV: Asteroid and Martian surface sequencing
Phase I1. Cross Platform: HiSeq/MiSeq, 454, PGM, Proton, PacBio2. Cross Protocol: ribo-depletion, stranded, degraded3. Cross Site: 3 sites for each platform, replicates at each site
SeQC, ABRF, ENCODE, others
• Provide reference data resources • Best practices for
– gene quantification, – isoform characterization,– dynamic range comparisons, – managing inter-site and intra-site
variation, – analysis pipelines, and – cross-platform testing of
transcriptome hypotheses • To address some other aspects of
RNA-seq, including – variant detection, – allele-specific expression, – RNA editing, and gene fusions.
• And more …
Phase I Study Design
Sequence mismatches with hg19
Q10 – Q60, most variation at read starts and ends
Higher alignment rates with platform-specific algorithms vs. STAR
Higher single-base mismatch and indel rates with platform-specific algorithms vs. STAR
454 ILMN PAC PGM PRO
Gene body coverageRNA:polyA rRNA-depleted total polyA polyA polyA
degraded
degraded
5’
3’
Inter-site CV
Inter-site R2
Variation and correlation between laboratory sites
Transcript splice junction detection
Long reads provide efficient junction detection
Most junctions are detected by three or more platforms
POLYA(11,820)
RIBO (11,294)
PRO(13,797)
PGM(12,572)
454 (7579)
1112
266696
680
1207
65
486
8923044
59
93
5566
359 83
37439
71 317
366 79
7451957330
410 867
39
157
179
46
928
Total (18,002)
DEGs detected by three or more methods: 61.4%
DEGs detected by twomethods: 16.6%
Unique DEGs:454 1.5%POLYA 6.2%RIBO(-) 3.9%PRO 6.7%PGM 3.8%Total unique 22.0%
Sets containing more than 1000 genes are indicated in red; 100-999 in yellow.
Detection of Differentially Expressed Genes,sample A vs. B
Transcript abundance measurements using polyA-enrichment or rRNA-depletion library preparation methods
Correlation with RT-qPCR
PolyA vs. ribo-depletion for detection of differential gene expression
Correlations of measured transcript abundances for high-quality vs. degraded total RNA
- rRNA-depleted- Illumina HiSeq
Corr
elati
on c
oeffi
cien
ts
Samples compared
A A A dA dA dA BdA dA dA dA dA dA dB
1.0
0.9
0.8
0.7
Illum
ina
PGM
SVA:Leek JT, Storey JD.PLoS Genet. 2007
Surrogate Variable Analysis to remove cross-platform and cross-site variation
ABRF NGS Study
FDA SeQC
Funded by:Vendor donations of sample preparation and sequencing reagents
Participating laboratories
ABRF
Manuscript in review:6 figures, 2 tables37 supplementary figures, 7 supplementary tables
The ABRF NGS Study, Phase I
26 primary scientists34 contributing scientists21 research institutions
4.3 billion reads447 billion nucleotides
The ABRF NGS Study, Phase II
DNA sequencing topics were brainstormed and prioritized by the study consortium
Samples were chosen based on the August 2013 Genome in a Bottle Workshop
Phase II DNA sequencing aims
Reference data sets• Intra- and inter-lab replication to model the range of performance
expected under normal service laboratory conditions Reference samples• Easily accessible for self-evaluation by comparison to the reference data• Standardized, stably reproduced, suitable for methods development Immediate utility• Performance metrics and data applicable to methods used now or in the
near future by core sequencing facilities
Phase II projectsin no particular order, with project scope and sequencing coverage to be prioritized by interest and funding:
Performance using different platforms and technical protocols• NIST GiaB designated human genomic DNA• Measure sequencing accuracy and coverage Performance using damaged DNA and chimeric cell populations• DNA from formalin-fixed, paraffin embedded cell mixtures• Measure sequencing accuracy, coverage, and limits of detection for
somatic mutations
Performance on small genomes over a range of GC content• NIST GiaB (with FDA) designated bacterial genomic DNA• Measure sequencing accuracy and coverage
Sample ID DNA source ProjectA Ashkenazim Jew, maternal 1B Ashkenazim paternal 1C Ashkenazim child 1M pool of mutant Horizon Dx lines #1, #3 plus Acrometrix lines #2, #4:
1-2 48% each, 3-4 2% each by cell count M1 50% C, 50% M cells in FFPE (each target’s copy number = 24% or 1%) 2M2 80% C, 20% M cells in FFPE (targets = 9.6% or 0.4%) 2M3 90% C, 10% M cells in FFPE (targets = 4.8% or 0.2%) 2M4 95% C, 5% M cells in FFPE (targets = 2.4% or 0.1%) 2M5 99% C, 1% M cells in FFPE (targets = 0.48% or 0.02%) 2M6 99.5% C, 0.5% M cells in FFPE (targets = 0.24% or 0.01%) 2M7 99.9% C, 0.1% M cells in FFPE (targets = 0.048% or 0.002%) 2M8 99.99% C, 0.01% M cells in FFPE (targets = 0.0048% or 0.0002%) 2Sta Staphylococcus aureus 3Sae Salmonella enterica 3Psa Pseudomonas aeruginosa 3Cls Clostridium sporogenes 3P pooled metagenomic sample with all four bacterial genomes 3
Phase II samples
Species Genome (bp)
Avg % GC
Reference strain Distributor
Staphylococcus aureus 2.8x10^6 33 NRS77 (NCTC 8325)
NARSA #NRS77
Salmonella enterica subsp. enterica serovar Typhimurium
4.9x10^6 52 LT2 ATCC #700720
Pseudomonas aeruginosa 6.7x10^6 67 PA01 ATCC #47085Clostridium sporogenes 4.1x10^6 28 Metchnikoff ATCC #15579
Small genomes project: sizes and GC content
Platform Project 1 Samples
Project 2 Samples Project 3 Samples
Illumina HiSeq 2000 A, B, C Sta, Sae, Psa, Cls, PIllumina HiSeq 2500 C M1-M8
Illumina 2500 RapidTrack C Illumina MiSeq C for long-
read scaffold Sta, Sae, Psa, Cls, PIllumina Moleculo A, B, C
Life Technologies Proton A, B, C M1-M8 Sta, Sae, Psa, Cls, PLife Technologies PGM Sta, Sae, Psa, Cls, P
Pacific Biosciences C for long-read scaffold Sta, Sae, Psa, Cls, P
New platforms? (Illm X10, NextSeq 500; Qiagen GeneReader, Oxford MinION…) ? ? ?
Library Protocol Nextera on HiSeq C M1 Sta, Sae, Psa, ClsNuGEN on HiSeq C M1 Sta, Sae, Psa, Cls
New England Biolabs on HiSeq C M1 Sta, Sae, Psa, ClsSigma WGA on Proton C M1 Sta, Sae, Psa, Cls
NuGEN WGA on Proton C M1 Sta, Sae, Psa, ClsQiagen WGA on Proton C M1 Sta, Sae, Psa, Cls
Platforms and library methods
An ABRF – GiaB collaboration
NIST• Extract high-quality genomic DNA from cultured cells for A, B, C, Sta, Sae,
Psa and Cls• Prepare equimolar blend of bacterial DNA for pool P• Procure somatic mutation cell lines, create pools M1-M8 titrated by cell
counts• Extract genomic DNA from FFPE blocks of cell suspensions• Distribute aliquots of DNA reference stocks to participating study labs ABRF• Assemble platform groups with at least 3 labs per instrument or method• Each platform group will determine a consensus protocol for library
preparation and sequencing• Sequence one library per sample per site (intra-lab replicates encouraged)• Collect and annotate data in a central repository• Analyze sequencing performance
Name email Contact regarding:Baldwin, Don [email protected] study designGrills, George [email protected] vendor and partner relationsMason, Chris [email protected] data analysisNicolet, Charlie [email protected] sequencing methodsTighe, Scott [email protected] logistics
The ABRF NGS Study leadership groupin alphabetical order, with level of participation and devotion to be prioritized by alcoholic intake: