Download - Día 19 - Noel Chen - Introducción a Novogene
LeadingEdgeGenomicServices&Solu5ons
Introduction of Novogene Bioinformatics Corporation
Noel Chen, Ph.D. VP of Business Development
Novogene Overview
2
• Foundedin2011
• Rapidgrowtheachyear,now1000employees
• Providinghigh-quality,next-gensequencingandbioinforma5csservicesinresearchandclinicalmarkets
• CurrentlythelargestIlluminacustomerandtheonlyIlluminaIGNpartnerinChina
• ThelargestsequencingcenterinChinaincapacity
• PreparingforIPO
• RevenuehassurpassedBGIinresearchmarketinChinarecently
Headquarters in Beijing
Worldwide Branches of Novogene
3
Beijing
Tianjin
HK
UKUS
Est.2011
Est.2014
Founder and Chief Executive
4
Dr. Ruiqiang Li • One of the world’s leading experts in genomics and bioinformatics • Best known for developing the software SOAP for ultra-fast sequence mapping, variation detection, and de novo genome assembly. • Prior experience
• Vice President of BGI • Principle Investigator at Peking University &
Peking/Tsinghua Center for Life Sciences • 70 publications (30 in Nature and Science series) that are cited over 12,000 times • PhD in Biology from University of Copenhagen
The Development of Novogene
5
28 86
198
506
0
100
200
300
400
500
600
2011 2012 2013 2014
Employees
2011 2012 2013 2014
Revenue
Both revenue and number of employees more than doubled each year since our
founding.
1,000 Professional Employees
6
Administration 8%
Sequencing 15%
Service & Support 25%
R & D 12%
Bioinformatics 40%
Doctorates 14%
Masters 61%
Bachelors 23%
Others 2%
Our focus: High quality service and customer satisfaction 75% of our employees have advanced degrees.
The Largest Sequencing Center in China
7
Platform Read Length Q30 (Data Quality Guarantee)
HiSeq X 2×150 bp ≥80%
HiSeq 2500/2000 2×250 bp 2×125 bp ≥85% 1×50 bp ≥90%
HiSeq 4000 2×150 bp
≥80% MiSeq
2×300 bp 2×250 bp
NextSeq 500 2×75 bp 1×75 bp
Total Output/Month 234 Tb
Illumina’s Official Quality Guarantee
8
Our data quality guarantee exceeds Illumina’s official guarantee. We are the only company providing this guarantee.
The Largest Sequencing Center in China
9
Platform Read Length Novogene Q30 Guarantee
Average Q30 Delivered
Illumina Q30 Guarantee
10 HiSeq X 2×150 bp ≥80% 88.11% ≥75%
10 HiSeq 2500/2000
2×250 bp ≥80% 91.22% ≥80% 2×125 bp ≥85% 88.29% ≥80% 1×50 bp ≥90% 96.57% ≥80%
1 HiSeq 4000 2×150 bp ≥80% 90.10% ≥75%
4 MiSeq 2×300 bp - 75.20% ≥70%
- ≥75% 2×250 bp ≥80%
1 NextSeq 500 2×75 bp ≥80% 90.37% ≥80% 1×75 bp ≥80% 85.58% ≥80%
Total Output/Month 234 Tb
Our Human Whole Genome Sequencing Service
10
Platform HiSeq X Ten
Read length 2×150 bp
Turnaround time 15 working days
Standard analysis Additional 8 working days
Advanced analysis upon request
Different Batch Flow Cell 1 Flow Cell 2 Output 972.0 G 943.2 G
Q30 90.30% 88.90%
Different Sample Sample 1 Sample 2 Raw Data 105.5 G 105.8 G
Mapping Ratio 99.90% 99.90% Effective Coverage 31.7 31.8
Service Parameter
Data Output and Quality
“I am extremely satisfied with the quality of the WGS results Novogene delivered.”
From customer Justin Loe, CEO of Full Genomes Corporation, Maryland, USA
Human Whole Genome Sequencing (WGS)
11
Standard Bioinformatics Pipeline of WGS
Raw data
Clean data
Alignment
Annotation
CNV
Case Control Yes
SNP, InDel SV Somatic SNV Somatic InDel
Extensive Quality Controls
12
DNA/RNA Preparation
Sequencing
Bioinformatics Analysis
Sample Received
Libraries Preparation
Data Delivery
Sample QC Report
Data Delivery
~5 days
~4 days
~3-12 days
~15 days
Library QC Report
Raw data QC Report
Data Report
Delivery Records
To ensure the accuracy and reliability of the sequencing data, Novogene strictly controls the quality of every step.
Workflow
Data Analysis on HPC (High Performance Computing) Platform
13
DELL Computing Nodes Memory Size: 17 TB
Computing Power: 73 T flops
Storage: 3.2 PB
Data Type Data Analysis Capacity / Month Human Genome 360 Tb / 4000 samples
Exome 40 Tb / 8000 samples
Transcriptome
Products and Services Overview
14
Life Science Research Services p Human whole genome &
exome sequencing
p Transcriptome sequencing
p Plant and animal sequencing
p Microbial sequencing
p Bioinformatics analysis
Clinical Genetic Testing (China)
p Cancer generic testing & risk
assessment
p Cancer drug panel
p ctDNA detection
p NIPT
Service Portfolio for Global Researchers
15
Whole Genome Sequencing • Whole genome re-sequencing • Whole exome sequencing • Single-cell DNA sequencing • Target region sequencing
Transcriptome Sequencing • mRNA sequencing • Single-cell RNA sequencing • lncRNA sequencing • Whole genome bisulfite sequencing
Microbial Genome Sequencing • Microbial genome re-sequencing • Microbial de novo sequencing • Metagenomic sequencing • 16S/18S/ITS amplicon sequencing
Animal & Plant Genome Sequencing • Animal & Plant re-sequencing • Animal & Plant de novo sequencing • Pan-genome re-sequencing • Genotyping by Sequencing
Solutions for Human Disease Research
16
Technology Focus
DNA Level RNA Level Epigenetics Single cell
• WGS • WES • Target-seq • Cancer panel
• RNA-seq • DGE • Small RNA-seq • LncRNA-seq
• WGBS • WGS • WES • RNA-seq • LncRNA-seq
• SNP/InDel/SV/CNV
• Somatic mutations
• Driver gene • Clonal
evolution
• Differentially expressed genes
• Alternative splicing
• Fusion gene • LncRNA
• Methylation analysis
• Differential methylation region(DMR)
• SNP/InDel/SV/CNV
• Differentially expressed genes
• Heterogeneity
Whole Exome Sequencing (WES)
17
Platform HiSeq 4000
Exome Capture Agilent SureSelect V6 (58M) / V5
Read length 2×150 bp (longer reads with 20% more data than PE125)
Turnaround time 15 working days Standard anaylsis Additional 5 working days
Service Parameter (State-of-the-Art Platform)
Raw data
Clean data
Alignment
Annotation
InDel
Case Control Yes
SNP Somatic SNP Somatic InDel
Standard Bioinformatics Pipeline ExAC database--including 17 international exome databases for free
18
Inherit Susceptibility Gene Screening NovoCRTM
Individual cancer panels
Multi-cancer testing
Personalized Cancer Therapy NovoPMTM
Tissue samples
Standard 47 genes
Professional 483 genes
ctDNA
Standard 40 genes
Professional 483 genes
• NovoPM detects SNVs, indels, CNVs, fusions, and their relationships with cancer drugs to guide personalized cancer therapy.
• NovoCR assesses an individual’s risk in developing cancer. • We also offer custom panel service based on Agilent capture and HiSeq 4000.
Cancer Panel Solutions
RNA Sequencing
19
Platform HiSeq 4000 Read length 2×150 bp
Turnaround time 15 working days Standard anaylsis additional 15 working days
Service Parameters
Novogene Advantages
• HiSeq paired-end 150 bp (longer reads)
Sequencing strategy
• Over 3,000 customer projects successfully completed
Rich experience
• Self-developed software (NovoFinder)--aim to find the genes you need
Bioinformatics analysis
RNA Sequencing Data Analysis
20
Sequencing Data QCTotal RNA mRNA Library
Genome Available Genome Unavailable
Genome Mapping
Gene Structure Gene Expression
Alternative Splicing Antisense Transcripts
SNP & InDel Differential exon usage
Gene Expression Level Sample Correlation
Differential Expressed Genes GO/KEGG Enrichment
Transcriptome Assembly
Transcripts Sequence
Length Distribution Function Annotation
SNP & InDel
Long Non-coding RNA Sequencing
21
Platform HiSeq 4000
Read length Paired-end 150 bp
Turnaround time 40 working days
Service Parameter Standard Bioinformatics Analysis (by an all PhD team)
Long non-coding RNA plays important regulatory functions. Our service enables researchers to simultaneously obtain information on mRNAs and lncRNAs.
Microbial Genome Sequencing
22
Microbial Genome-
sequencing
Bacterial genomesequen
cing
Draft map
Fine map
Complete map
Re-sequencing
Draft map
Fine map
Re-sequencing
16S
18S
ITS
Fosmid,
plasmid
Mitochondria
Chloroplast
Virus
Meta-genomic
sequencing
Meta-survey
Fungal genome
sequencing
Small genome
sequencing
Amplicon sequencing
Meta sequencing
Platform: HiSeq PE 150 1000+ samples sequenced per month
Single Cell Sequencing
23
}5——z$�\• heterogeneity
≈
Why?
Germ cell
}5——z$�\• heterogeneity
≈
Why?
Neurons
Stem cell Immune cell
Tumor cell(CTC)
Single Cell Sequencing
24
• MALBAC for DNA, SMARTER for RNA Amplification technology
• 2 papers published (Nature & Science) Rich experience
• HiSeq X for human, HiSeq2500/4000 for other species
Sequencing strategy
Novogene Advantages
Customer Projects Completed in 2014
25
Transcriptome Sequencing
Human Resequencing
Microbial Sequencing
Plant & Animal
resequencing
Plant & Animal de
novo sequencing
3122 3181
813
243 32
We Understand Science
26
• >100 published ar=cles with a total impact fact of 649 in just 4 years
• 34 patents in NGS and bioinforma=cs
• Numerous ar=cles in submission
27
Human preimplantation embryos and embryonic stem cells
Methods: Single Cell lncRNA+mRNA Seq
22,687 maternally expressed
genes detected, including
8,701 lncRNAs,
9,735 increased
than microarray
2,733 novel
lncRNAs discovered and many
are expressed in specific developmental stages
EPI cells and primary hESC
outgrowth have dramatically
different transcriptome, 1,498 genes
showing differential expression.
Grope samples: Ø Metaphase II oocyte, zygote, 2-cell, 4-cell, 8-cell, morula and late blastocyst at hatching stage; Ø 3-30 biology repeats per group; Ø 124 cells totally
Method: lncRNA+mRNA
HiSeq2000, PE100 20M-60M clean reads; 438 Gb data totally
n Novogene Case 1
28
Human Single Sperm Cells
Methods: Single Cell Whole Genome Sequencing
One Healthy Asian Male in late 40s
93 sperm:~1x
70x99 sperm
MALBAC
23% coverage
2.8 million SNP
2368 autosomal crossover events in the sperm cells; 26 .6 crossovers per cell on average
Constructed a genetic map of recombination of the individual
5% sperm were deteced having autosomal aneuploidy
6 sperm:~5x
43% coverage
1.4 million hetSNPs
n Novogene Case 2
29
allotetraploid cotton Genome DNA
Methods: de novo and Transcriptome Sequencing
n Novogene Case 3 allotetraploid cotton
~96% of the estimated allotetraploid genome (total scaffold length
2.4/2.5 Gb) 265,279 contigs
(N50=34.0 kb) and 40,407 scaffolds
(N50=1.6 Mb)
RNA-seq
245x
97 samples(from different organ,
developmental Stages and adverse conditions)
contig N50 34Kbscaffold N50 1.6Mtotal length 2.43G
allotetraploid cotton evolutionary mechanism
and function of A-subgenome and D-
subgenome.
A branch of MYB genes family takes an
important role in the fiber development.
Many CESA genes got significant positive
selection function in domestication
De novo
De novo Sequencing Publications
30
l Brief Introduction Applying next-generation sequencing and state-of-the-art assembly algorithms make
the construction of pan-genome map feasible. Constructing genome maps for several
individuals provides unprecedented opportunities to investigate the detailed genetic
diversity at population level.
Pan-genome Sequencing
ATGCTACGGTAACCCTGATTGCAATG
? ? ? ? ? ? ? ? ? ? ? ?
。。。 。。。
ATGCTACGGTAACCCTGATTGCAATG
ATGCTACGGTAACCCTGATTGCAATG
。。。 。。。
Ø Key Points
The pan-genome is a superset of all the genes in all the strains of a species. :
ü Core Genome: Containing genes present in all strains;
ü Dispensable Genome: Containing genes present in two or more strains;
ü Specific Gene: Specific to single strain.
Core genome
dispensable genome
specific genome
l Pan-Genome Sequencing
Material selection and QC
Library Construction
Genome preliminarily assembly
Pan-genome Construction
Customized bioinformatics analysis
Gene Annotation
comparative genomics analysis
SNP/InDel/SV/CNV / novel sequence
Gene family analysis
Phylogenetic analysis
Co-linear analysis
Sample genome : 60X
Complex genome: 100X
230bp/500bp/2K /5K
Pan-genome analysis of soybean wild relatives(IF:39.08)
LeadingEdgeGenomicServices&Solu5ons
Organizations Collaborated with Novogene in 2014
36
Look forward to your sincere cooperation !
Website: www.novogene.com
Muchas gracias por su atención! Preguntas???