cs273a lecture 5, win07, batzoglou quality of assemblies—mouse n50 contig length terminology: n50...
Post on 19-Dec-2015
216 Views
Preview:
TRANSCRIPT
CS273a Lecture 5, Win07, Batzoglou
Quality of assemblies—mouse
Terminology: N50 contig lengthN50 contig lengthIf we sort contigs from largest to smallest, and startCovering the genome in that order, N50 is the lengthOf the contig that just covers the 50th percentile.
7.7X sequence coverage
CS273a Lecture 5, Win07, Batzoglou
Quality of assemblies—chimp
3.6X sequence Coverage
AssistedAssembly
CS273a Lecture 5, Win07, Batzoglou
History of WGA
• 1982: -virus, 48,502 bp
• 1995: h-influenzae, 1 Mbp
• 2000: fly, 100 Mbp
• 2001 – present human (3Gbp), mouse (2.5Gbp), rat*, chicken, dog, chimpanzee,
several fungal genomes
Gene Myers
Let’s sequence the human
genome with the shotgun
strategy
That is impossible, and
a bad idea anyway
Phil Green
1997
CS273a Lecture 5, Win07, Batzoglou
Pyrosequencing on a chip
Mostafa Ronaghi, Stanford Genome Technologies Center
454 Life Sciences
CS273a Lecture 5, Win07, Batzoglou
Short read sequencing protocol
• Random, high-coverage clone library (CovG = 7 – 10x)
• Low-coverage of clone by reads (CovR = 1 – 2x)
1234 1235
FRAGMENT
genome
clones
AMPLIFY & READ
12351234
reads
CovG
CovR
CS273a Lecture 5, Win07, Batzoglou
Short read sequencing protocol
RANDOMLY SELECT 200,000 FRAGMENTS
CLONE
FRAGMENT AND SELECT 150KB SEGMENTS
FRAGMENT
A C G A
bead attachment primer
adapter
clone id tag
LIGATE ADAPTERS
166 clone batch
CLONE ON BEADS BY PCR EMULSION
ACGATGATCGATGATTAC...
TGCTCAGACTTAGCTATT...
CAATTTATATCAGAGACA...
ACGAAATCGAGAGCAAGA...
clone id tag
SEQUENCE 250,000 READS ON PLATE
sequenceread
1200 plates
ASSEMBLY
target genome
“clones”
CS273a Lecture 5, Win07, Batzoglou
Ordering clones into clone contigs
293
1001
1234
882
7
94
clone graph
NODE CONTRACTION
clone contig1234
293
100194
7882
CS273a Lecture 5, Win07, Batzoglou
Contig assembly
CONSTRUCT READ SETS
Euler assembler
intersection read set
subtraction read set
CS273a Lecture 5, Win07, Batzoglou
Contig assemblyCONTIG
ASSEMBLY 1: READ SETS
CONSTRUCT READ SETS
Euler assembler
CONSTRUCT CONTIG SETS
CONTIG ASSEMBLY 2: CONTIG SETS
Euler assembler
CONTIG ASSEMBLY 3:
CLONE CONTIGS
assembly
intersection read set
subtraction read set
contig set
CS273a Lecture 5, Win07, Batzoglou
Assembly quality
Sequence CoverageContig N50 (Kb)
Base quality (Q)
Misassemblies (#/Mb)
Small indels (#/Mb)
D. Melanogaster(118 Mb)
94.2% 160.2 38.4 2.5 1.6
Human chr21(34 Mb)
97.5% 79.0 35.6 1.9 2.3
Human chr11(131 Mb)
96.3% 57.4 34.4 2.8 1.9
Human chr1(223 Mb)
96.2% 63.0 34.4 3.0 2.0
Read length = 200 bp, Error rate = 1%, Net coverage = 20.0x
CS273a Lecture 5, Win07, Batzoglou
Some future directions for sequencing
1. Personalized genome sequencing• Find your ~3,000,000 single nucleotide polymorphisms (SNPs)
• Find your rearrangements
• Goals:• Link genome with phenotype• Provide personalized diet and medicine• (???) designer babies, big-brother insurance companies
• Timeline:• Inexpensive sequencing: 2010-2015• Genotype–phenotype association: 2010-???• Personalized drugs: 2015-???
CS273a Lecture 5, Win07, Batzoglou
Some future directions for sequencing
2. Environmental sequencing• Find your flora: organisms living in your body
• External organs: skin, mucous membranes• Gut, mouth, etc.
• Normal flora: >200 species, >trillions of individuals
• Flora–disease, flora–non-optimal health associations
• Timeline:• Inexpensive research sequencing: today• Research & associations within next 10 years• Personalized sequencing 2015+
• Find diversity of organisms living in different environments• Hard to isolate• Assembly of all organisms at once
CS273a Lecture 5, Win07, Batzoglou
Some future directions for sequencing
3. Organism sequencing• Sequence a large fraction of all organisms
• Deduce ancestors• Reconstruct ancestral genomes• Synthesize ancestral genomes• Clone—Jurassic park!
• Study evolution of function• Find functional elements within a genome• How those evolved in different organisms• Find how modules/machines composed of many genes evolved
CS273a Lecture 5, Win07, Batzoglou
Genome Evolution – Macro Events
• Inversions• Deletions• Duplications
top related