genomic techniquesbioinformatica.uab.cat/base/documents/mastergp/genomic... · 2013-05-15 · 3 µm...
TRANSCRIPT
Genomic techniques
Marta Puig Institut de Biotecnologia i Biomedicina
Universitat Autònoma de Barcelona
The genomic revolution
Whole genome sequences of tens of species and individuals (and more to come!)
Gene expression profiling in multiple conditions, tissues, individuals and species
Mapping of functional regions in the genome
Genomic techniques usage
0
1000
2000
3000
4000
5000
6000
7000
8000
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
500
1000
1500
2000
2500
3000
Microarrays
Next-generation sequencing
Number of publications:
First description of cDNA microarrays (Schena et al.1995)
First description of oligonucleotide arrays (Lockhart et al. 1996)
Release of the 454 sequencing system (2005)
Overview
DNA microarrays
Next-generation sequencing technologies
Microarrays
Common features of microarrays
Parallelism/high-throughput (thousands of genes analyzed simultaneously)
Miniaturization (small feature and chip size)
Automation (chip manufacture, processing and analysis)
Based on nucleic-acid hybridization and base pairing
Detection and quantification (of hybridized molecules)
• Probes fixed on a glass slide • Labeling of target samples
A G C T A A C G G T
T A C C G A C A T G
A G C T A A C G G T
A G C T A A C G G T
T A C C G A C A T G
T A C C G A C A T G
Two-sample competitive hybridization 2 colors / relative quantification
Single sample hybridization 1 color / absolute quantification
A G C T A A C G G T
T A C C G A C A T G
A G C T A A C G G T
A G C T A A C G G T
T A C C G A C A T G
T A C C G A C A T G
A T G G C T G T A C
A T G G C T G T A C
A T G G C T G T A C
C G T G C
A T
A
T C
Microarray principles
Classes of microarrays – probes
Custom/spotted/two-color microarrays (cDNAs, BACs)
High-density oligonucleotide arrays (GeneChip-Affymetrix)
Long oligonucleotide microarrays
- Agilent (25-60 bases)
- Illumina (50 bases)
- Nimblegen (50-75 bases)
2. Spotting of DNAs at high density onto a glass microscopy slide (5000-10000 spots per slide) and cross-linking to the glass surface
1. Isolation or PCR amplification of DNA fragments (≈1-200 Kb) including the genes or regions of
interest
Custom microarrays technology
probe (on chip)
sample (labeled)
3. Two independent mRNA or DNA samples are fluorescently labeled with Cy3 (green) or Cy5 (red)
4. The two labeled populations are combined in equal amounts and hybridized to the array
5. A laser scans the slide and calculates the ratio of fluorescence intensities between the two samples
Cy3 Cy5
Custom microarrays technology
Higher expression in sample 1
Expression in both samples
Higher expression in sample 2
Gene expression arrays
The entire array is formed by >500,000 cells, each containing a different oligo
The array elements are a series of 25-mer oligos designed from known sequence and synthesized directly on the surface
Gene expression arrays
Detection of transcription and quantification of expression levels
Short oligonucleotides (25 bases)
Multiple probes used for each transcript
Probes hybridize to the exons located near the 3’ end of the analyzed transcript
End of CDS or 3’ UTR (100-500 bp)
High expression level
Intermediate expression level
No expression
Gene expression arrays
3 µm beads coated with probes are located in microwells on the array surface
Each probe has an address to identify the location of each bead through an hybridization-based procedure
High redundancy (30 beads/probe)
This technology can be used to create different types of arrays (gene expression, SNP genotyping…)
Illumina Bead arrays
50 bases 29 bases
http://www.illumina.com/technology/direct_hybridization_assay.ilmn
Figure 3. Shoemaker et al. (2001) Nature 409, 922-927
Long oligonucleotides (60 bases)
Overlapping probes that cover completely the region of interest
Identification of new transcribed sequences
Characterization of a novel testis transcript using tiling arrays
Genome tiling arrays
Figure 1. Cáceres et al. (2003) PNAS 100: 13030–13035.
Detection of a many genes with a higher expression levels in human brain in comparison with chimpanzee and macaque
Heat map
High Low
Gene1
Gene2
Sam
ple
A
Sam
ple
B
Gene3
Graphical representation of microarray results
SNP arrays
Genotyping of SNPs
DNA is hybridized to the microarray
Can genotype millions of SNPs in a single experiment
Figure 2. LaFramboise (2009) Nucleic Acids Research 37: 4181–4193
Single-base extension
Applications of microarrays
Measuring transcript abundance (expression arrays)
Identification of transcribed sequences (tiling arrays)
Analysis of alternative splicing (exon or exon-junction arrays)
SNP genotyping (SNP oligonucleotide arrays)
Estimating DNA copy number (CGH on BAC or oligonucleotide arrays)
Identifying protein binding sites (ChIP-chip on tiling or oligonucleotide arrays)
Detection of epigenetic modifications (ChIP-chip on tiling or oligonucleotide arrays)
RNA
DNA
Chromatin
Limitations of microarrays
Reliance on available genomic sequence and gene annotations
Cross hybridization
Limited dynamic range and saturation
High amount of RNA required
High cost
Sequencing methods
1. DNA amplification
DNA
Denaturalization +
Primer annealing
Cloning PCR
Seqüenciació clàssica pel mètode Sanger
Taq
Taq
Copy of template DNA with a thermotolerant DNA polymerase
Amplified fragment
Plasmid vector
DNA fragments
Cloning
+
Transformation into bacteria
Isolation of plasmid DNA
Sanger – Classical sequencing
Seve
ral c
ycle
s
Did RESULTS
Long reads (500-1000 pb)
Small scale (96 reactions/run)
Separation by size in a polyacrylamide gel
2. Sequencing reaction
Fluorescence detection +
3. Capillary electrophoresis
Cromatograma
C T A T G C T C G
No other nucleotide can be added.
Each dideoxynucleotide is labeled with a different fluorophore.
Dideoxynucleotides (ddNTPs)
Figure 1. Shendure and Ji (2008) Nature Biotechnology 26: 1135-1145.
Primer
Sanger – Classical sequencing
Mètodes de seqüenciació de segona i tercera generació
2nd and 3rd generation sequencing technologies
454/Roche – Pyrosequencing
Illumina – Reversible termination
SOLiD – Sequencing by ligation
2nd generation sequencing technologies
1. Preparation of sequencing library with adapters
2. Solid-phase amplification
3. Sequencing reaction
DNA fragments of proper size
SEQUENCING LIBRARY
DNA
Adapters Adapter ligation
Fragmentation
MASSIVELY PARALLEL SEQUENCING METHODS
Common steps in next-generation methods
1. DNA fragmentation and adapter ligation
2. Emulsion PCR within water-in-oil droplets
LIBRARY
beads
Figures 1 and 2. Shendure and Ji (2008) Nature Biotechnology 26: 1135-1145.
454/Roche – Pyrosequencing
PCR en emulsió
http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing/next-generation-systems/solid-sequencing-chemistry.html
DNA template Primers
DNA polymerase Beads + P1
Oil
Water droplet
Emulsion PCR
RESULT
3. Bead distribution in individual wells 4. Pyrosequencing
454/Roche – Piroseqüenciació
Figure 3. Metzker (2010) Nature Reviews Genetics 11: 31-46. Figure 1. England and Pettersson (2005) Nature Methods 2: Application Note
454/Roche – Pyrosequencing
Read length = 250-500 bp >1 million reads per run
2. Solid-phase amplification and cluster generation by bridge PCR
Illumina – Terminació reversible
1. DNA fragmentation and adapter ligation
Figures 1 and 2. Shendure and Ji (2008) Nature Biotechnology 26: 1135-1145.
Illumina – Reversible termination
LIBRARY
>100 million reads per run
3. Flowing of fluorescent reversible terminator dNTPs and incorporation of a single base per cycle
4. Reading of the identity of each base of a cluster from sequential images taken after each nucleotide incorporation
Figure 2. Metzker (2010) Nature Reviews Genetics 11: 31-46.
Illumina – Reversible termination
Read length = 75-150 bp
Amplification by emulsion PCR
Oligonucleotides labelled with different fluorescence according to the two first bases
Hybridization and ligation to the initial sequencing primer
Image capture, elimination of fluorophore and repetition for several cycles
Change of the initial primer for one slightly displaced to restart the ligation cycles
Bases sequenced in non-consecutive pairs
Figure 3. Metzker (2010) Nature Reviews Genetics 11: 31-46.
SOLiD – Sequencing by ligation
100-500 million reads per run
http://www.youtube.com/watch?v=nlvyF8bFDwM
http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing/next-generation-systems/solid-sequencing-chemistry.html
A
SOLiD – Sequencing by ligation
Read length = 50-100 bp
Practical exercise
Sanger
454/Roche
Illumina
SOLiD
SOLiD
LEGEND
How will an heterozygote SNP be detected during the sequencing of a genome?
(The first nucleotide is an A)
Increase read length
Improve sequence accuracy
Single-molecule sequencing (no amplification)
De novo assembly of complex genomes
Challenges of existing NGS methods
Third generation sequencing method
Amplification by emulsion PCR
Each bead is located in a well of a semiconductor chip connected to an ion-sensitive layer able to detect changes in pH
http://www.iontorrent.com/technology-how-does-it-work-more/ Figure 1. Rothberg et al. (2011) Nature 475:348-52.
Ion Torrent – Semiconductor sequencing
Ion Torrent – Semiconductor sequencing
Unlabelled nucleotides flow sequentially through the chip
Incorporation to DNA strand in synthesis causes the release of a proton. Detection of pH change within the well allows the conversion of chemical information into digital information
Non-optical DNA sequencing (cheaper, faster)
1 million sensors
6 million sensors
11 million sensors
Figure 4. Metzker (2010) Nature Reviews Genetics 11: 31-46.
Single-molecule with no amplification step
Immobilized DNA polymerase
Fluorescence pulse of the corresponding color when a nucleotide is incorporated
Pacific Biosciences – Real-time sequencing
Table 1. Kircher and Kelso (2010) Bioessays 32: 524-536.
Comparison of sequencing technologies
Roche/454 Illumina SOLiD Pacific Biosciences
Library amplification method
emPCR Bridge PCR emPCR NA (single molecule detection)
Sequencing method
Polymerase-mediated incorporation of unlabelled nucleotides
Polymerase-mediated incorporation of end-blocked fluorescent nucleotides
Ligase-mediated addition of 2-base encoded fluorescent oligonucleotides
Polymerase-mediated incorporation of terminal phosphate labelled fluorescent nucleotides
Detection method
Light emitted from secondary reactions initiated by release of phosphates (PPi)
Fluorescent emission from incorporated dye-labelled nucleotides
Fluorescent emission from ligated dye-labelled oligonucleotides
Real time detection of fluorescent dye in polymerase active site during incorporation
Post incorporation method
NA
Chemical cleavage of fluorescent dye and 3′ blocking group
Chemical cleavage removes fluorescent dye and 3′ end of oligonucleotide
NA
Error model Substitution errors rare, insertion/deletion errors at homopolymers
End of read substitution errors
End of read substitution errors
Random insertion/deletion errors
Read length 400 bp 150 bp 75 bp >1,000 bp
Comparison of sequencing technologies
Based on Table 1. Mardis, ER (2011) Nature 470: 198-203. NA = not applicable
Applications of NGS technologies
Application Objective
Complete genome resequencing Discovery of variants among individuals
Exome sequencing Resequencing of specific regions within a genome
RNA-seq Sequencing of transcripts and quantification of gene expression levels
ChIP-seq Genome-wide mapping of protein-DNA interactions
Paired-end sequencing Discovery of structural variants
Metagenomics Sequencing of all the genomes in a given ecological habitat
Reference mapping De novo assembly
Assembly of reads by similarity with a reference genome sequence
Assembly of reads based on the overlap of their sequences
Comparison of each read with the reference sequence
Comparison of each read with all the other reads
New sequences and structural rearrangements can NOT be detected
New sequences and structural rearrangements CAN be detected
Not applicable the first time a genome is sequenced
More complex, slower and requires more computational resources
AGTATTAGCCTTAAGAAATTAAGTATTAGAA
AAAAA CTTAAGAAATTAAGTATTAGAA
AAAAATTAAGTATTAG AAGTATTAGAA
AAAAATTAAGTATTAGCCTTA TTAGAA
AAAAATTAAGTATTAGCCTTAAGAAAT
GCCTTAAGAAATTAAGTATTAGAA
AAA AAATTAAGTATTAGAA
AAAAATTAA AAGTATTAGAA
AAAAATTAAGTATTAG
AAAAATTAAGTATTAGCCTTAAG
Complete genome sequencing – Assembly
Targeted capture – Exome sequencing
http://www.lcsciences.com/applications/genomics/specialty-genomics-microarrays/specialty-genomics-microarrays-application-notes/
Probes
Figure 1. Gnirke et al. (2009) Nature Biotechnology 27:182-189
Array hybridization Solution hybridization All exons
microRNAs
Candidate regions
RNA-seq
AAAAA
Figure 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63
Sequencing of all the transcripts in a sample using NGS technologies
RNA-seq mapping of short reads in exon-exon junctions
CCGAAAATCAAGTCATCCCTAAAGACTAAGTAAGTAACCATATTACATTAAGGAAGGCACTTTAAAAGTTTATAATCATTTGTAGACTCCCACCAAAGCCACTGACTCGCAAGG
Exon Exon Intron
RNA-seq
ADVANTAGES Independence of the existence of an available genomic sequence
Detection of new transcripts
Single-nucleotide precision
Detection of splicing variants and alternative transcription starts and ends
Detection of SNPs in transcribed regions
Detection of allele-specific transcription
Accurate quantification of expression levels (wide range of measurements)
Great reproducibility
Small amount of initial RNA needed
RNA-seq
ChIP-seq
Figure 1. Massie and Mills (2008) EMBO reports 9: 337-343. Figure 2. Park (2009) Nature Reviews Genetics 10: 669-680.
Chromatin immunoprecipitation (ChIP) + Sequencing
Detection of transcription factor binding sites and other DNA-protein interactions