genomic techniquesbioinformatica.uab.cat/base/documents/mastergp/genomic... · 2013-05-15 · 3 µm...

48
Genomic techniques Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Upload: others

Post on 27-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Genomic techniques

Marta Puig Institut de Biotecnologia i Biomedicina

Universitat Autònoma de Barcelona

Page 2: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

The genomic revolution

Whole genome sequences of tens of species and individuals (and more to come!)

Gene expression profiling in multiple conditions, tissues, individuals and species

Mapping of functional regions in the genome

Page 3: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Genomic techniques usage

0

1000

2000

3000

4000

5000

6000

7000

8000

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

0

500

1000

1500

2000

2500

3000

Microarrays

Next-generation sequencing

Number of publications:

First description of cDNA microarrays (Schena et al.1995)

First description of oligonucleotide arrays (Lockhart et al. 1996)

Release of the 454 sequencing system (2005)

Page 4: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Overview

DNA microarrays

Next-generation sequencing technologies

Page 5: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Microarrays

Page 6: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Common features of microarrays

Parallelism/high-throughput (thousands of genes analyzed simultaneously)

Miniaturization (small feature and chip size)

Automation (chip manufacture, processing and analysis)

Based on nucleic-acid hybridization and base pairing

Detection and quantification (of hybridized molecules)

• Probes fixed on a glass slide • Labeling of target samples

Page 7: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

A G C T A A C G G T

T A C C G A C A T G

A G C T A A C G G T

A G C T A A C G G T

T A C C G A C A T G

T A C C G A C A T G

Two-sample competitive hybridization 2 colors / relative quantification

Single sample hybridization 1 color / absolute quantification

A G C T A A C G G T

T A C C G A C A T G

A G C T A A C G G T

A G C T A A C G G T

T A C C G A C A T G

T A C C G A C A T G

A T G G C T G T A C

A T G G C T G T A C

A T G G C T G T A C

C G T G C

A T

A

T C

Microarray principles

Page 8: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Classes of microarrays – probes

Custom/spotted/two-color microarrays (cDNAs, BACs)

High-density oligonucleotide arrays (GeneChip-Affymetrix)

Long oligonucleotide microarrays

- Agilent (25-60 bases)

- Illumina (50 bases)

- Nimblegen (50-75 bases)

Page 9: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

2. Spotting of DNAs at high density onto a glass microscopy slide (5000-10000 spots per slide) and cross-linking to the glass surface

1. Isolation or PCR amplification of DNA fragments (≈1-200 Kb) including the genes or regions of

interest

Custom microarrays technology

Page 10: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

probe (on chip)

sample (labeled)

3. Two independent mRNA or DNA samples are fluorescently labeled with Cy3 (green) or Cy5 (red)

4. The two labeled populations are combined in equal amounts and hybridized to the array

5. A laser scans the slide and calculates the ratio of fluorescence intensities between the two samples

Cy3 Cy5

Custom microarrays technology

Higher expression in sample 1

Expression in both samples

Higher expression in sample 2

Page 11: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Gene expression arrays

The entire array is formed by >500,000 cells, each containing a different oligo

The array elements are a series of 25-mer oligos designed from known sequence and synthesized directly on the surface

Page 12: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Gene expression arrays

Page 13: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Detection of transcription and quantification of expression levels

Short oligonucleotides (25 bases)

Multiple probes used for each transcript

Probes hybridize to the exons located near the 3’ end of the analyzed transcript

End of CDS or 3’ UTR (100-500 bp)

High expression level

Intermediate expression level

No expression

Gene expression arrays

Page 14: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

3 µm beads coated with probes are located in microwells on the array surface

Each probe has an address to identify the location of each bead through an hybridization-based procedure

High redundancy (30 beads/probe)

This technology can be used to create different types of arrays (gene expression, SNP genotyping…)

Illumina Bead arrays

50 bases 29 bases

http://www.illumina.com/technology/direct_hybridization_assay.ilmn

Page 15: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Figure 3. Shoemaker et al. (2001) Nature 409, 922-927

Long oligonucleotides (60 bases)

Overlapping probes that cover completely the region of interest

Identification of new transcribed sequences

Characterization of a novel testis transcript using tiling arrays

Genome tiling arrays

Page 16: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Figure 1. Cáceres et al. (2003) PNAS 100: 13030–13035.

Detection of a many genes with a higher expression levels in human brain in comparison with chimpanzee and macaque

Heat map

High Low

Gene1

Gene2

Sam

ple

A

Sam

ple

B

Gene3

Graphical representation of microarray results

Page 17: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

SNP arrays

Genotyping of SNPs

DNA is hybridized to the microarray

Can genotype millions of SNPs in a single experiment

Figure 2. LaFramboise (2009) Nucleic Acids Research 37: 4181–4193

Single-base extension

Page 18: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Applications of microarrays

Measuring transcript abundance (expression arrays)

Identification of transcribed sequences (tiling arrays)

Analysis of alternative splicing (exon or exon-junction arrays)

SNP genotyping (SNP oligonucleotide arrays)

Estimating DNA copy number (CGH on BAC or oligonucleotide arrays)

Identifying protein binding sites (ChIP-chip on tiling or oligonucleotide arrays)

Detection of epigenetic modifications (ChIP-chip on tiling or oligonucleotide arrays)

RNA

DNA

Chromatin

Page 19: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Limitations of microarrays

Reliance on available genomic sequence and gene annotations

Cross hybridization

Limited dynamic range and saturation

High amount of RNA required

High cost

Page 20: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Sequencing methods

Page 21: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

1. DNA amplification

DNA

Denaturalization +

Primer annealing

Cloning PCR

Seqüenciació clàssica pel mètode Sanger

Taq

Taq

Copy of template DNA with a thermotolerant DNA polymerase

Amplified fragment

Plasmid vector

DNA fragments

Cloning

+

Transformation into bacteria

Isolation of plasmid DNA

Sanger – Classical sequencing

Seve

ral c

ycle

s

Page 22: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Did RESULTS

Long reads (500-1000 pb)

Small scale (96 reactions/run)

Separation by size in a polyacrylamide gel

2. Sequencing reaction

Fluorescence detection +

3. Capillary electrophoresis

Cromatograma

C T A T G C T C G

No other nucleotide can be added.

Each dideoxynucleotide is labeled with a different fluorophore.

Dideoxynucleotides (ddNTPs)

Figure 1. Shendure and Ji (2008) Nature Biotechnology 26: 1135-1145.

Primer

Sanger – Classical sequencing

Page 23: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe
Page 24: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Mètodes de seqüenciació de segona i tercera generació

2nd and 3rd generation sequencing technologies

Page 25: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

454/Roche – Pyrosequencing

Illumina – Reversible termination

SOLiD – Sequencing by ligation

2nd generation sequencing technologies

Page 26: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

1. Preparation of sequencing library with adapters

2. Solid-phase amplification

3. Sequencing reaction

DNA fragments of proper size

SEQUENCING LIBRARY

DNA

Adapters Adapter ligation

Fragmentation

MASSIVELY PARALLEL SEQUENCING METHODS

Common steps in next-generation methods

Page 27: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

1. DNA fragmentation and adapter ligation

2. Emulsion PCR within water-in-oil droplets

LIBRARY

beads

Figures 1 and 2. Shendure and Ji (2008) Nature Biotechnology 26: 1135-1145.

454/Roche – Pyrosequencing

Page 28: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

PCR en emulsió

http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing/next-generation-systems/solid-sequencing-chemistry.html

DNA template Primers

DNA polymerase Beads + P1

Oil

Water droplet

Emulsion PCR

RESULT

Page 29: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

3. Bead distribution in individual wells 4. Pyrosequencing

454/Roche – Piroseqüenciació

Figure 3. Metzker (2010) Nature Reviews Genetics 11: 31-46. Figure 1. England and Pettersson (2005) Nature Methods 2: Application Note

454/Roche – Pyrosequencing

Read length = 250-500 bp >1 million reads per run

Page 30: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

2. Solid-phase amplification and cluster generation by bridge PCR

Illumina – Terminació reversible

1. DNA fragmentation and adapter ligation

Figures 1 and 2. Shendure and Ji (2008) Nature Biotechnology 26: 1135-1145.

Illumina – Reversible termination

LIBRARY

>100 million reads per run

Page 31: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

3. Flowing of fluorescent reversible terminator dNTPs and incorporation of a single base per cycle

4. Reading of the identity of each base of a cluster from sequential images taken after each nucleotide incorporation

Figure 2. Metzker (2010) Nature Reviews Genetics 11: 31-46.

Illumina – Reversible termination

Read length = 75-150 bp

Page 32: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Amplification by emulsion PCR

Oligonucleotides labelled with different fluorescence according to the two first bases

Hybridization and ligation to the initial sequencing primer

Image capture, elimination of fluorophore and repetition for several cycles

Change of the initial primer for one slightly displaced to restart the ligation cycles

Bases sequenced in non-consecutive pairs

Figure 3. Metzker (2010) Nature Reviews Genetics 11: 31-46.

SOLiD – Sequencing by ligation

100-500 million reads per run

http://www.youtube.com/watch?v=nlvyF8bFDwM

Page 33: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing/next-generation-systems/solid-sequencing-chemistry.html

A

SOLiD – Sequencing by ligation

Read length = 50-100 bp

Page 34: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Practical exercise

Sanger

454/Roche

Illumina

SOLiD

SOLiD

LEGEND

How will an heterozygote SNP be detected during the sequencing of a genome?

(The first nucleotide is an A)

Page 35: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Increase read length

Improve sequence accuracy

Single-molecule sequencing (no amplification)

De novo assembly of complex genomes

Challenges of existing NGS methods

Page 36: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Third generation sequencing method

Amplification by emulsion PCR

Each bead is located in a well of a semiconductor chip connected to an ion-sensitive layer able to detect changes in pH

http://www.iontorrent.com/technology-how-does-it-work-more/ Figure 1. Rothberg et al. (2011) Nature 475:348-52.

Ion Torrent – Semiconductor sequencing

Page 37: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Ion Torrent – Semiconductor sequencing

Unlabelled nucleotides flow sequentially through the chip

Incorporation to DNA strand in synthesis causes the release of a proton. Detection of pH change within the well allows the conversion of chemical information into digital information

Non-optical DNA sequencing (cheaper, faster)

1 million sensors

6 million sensors

11 million sensors

Page 38: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Figure 4. Metzker (2010) Nature Reviews Genetics 11: 31-46.

Single-molecule with no amplification step

Immobilized DNA polymerase

Fluorescence pulse of the corresponding color when a nucleotide is incorporated

Pacific Biosciences – Real-time sequencing

Page 39: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Table 1. Kircher and Kelso (2010) Bioessays 32: 524-536.

Comparison of sequencing technologies

Page 40: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Roche/454 Illumina SOLiD Pacific Biosciences

Library amplification method

emPCR Bridge PCR emPCR NA (single molecule detection)

Sequencing method

Polymerase-mediated incorporation of unlabelled nucleotides

Polymerase-mediated incorporation of end-blocked fluorescent nucleotides

Ligase-mediated addition of 2-base encoded fluorescent oligonucleotides

Polymerase-mediated incorporation of terminal phosphate labelled fluorescent nucleotides

Detection method

Light emitted from secondary reactions initiated by release of phosphates (PPi)

Fluorescent emission from incorporated dye-labelled nucleotides

Fluorescent emission from ligated dye-labelled oligonucleotides

Real time detection of fluorescent dye in polymerase active site during incorporation

Post incorporation method

NA

Chemical cleavage of fluorescent dye and 3′ blocking group

Chemical cleavage removes fluorescent dye and 3′ end of oligonucleotide

NA

Error model Substitution errors rare, insertion/deletion errors at homopolymers

End of read substitution errors

End of read substitution errors

Random insertion/deletion errors

Read length 400 bp 150 bp 75 bp >1,000 bp

Comparison of sequencing technologies

Based on Table 1. Mardis, ER (2011) Nature 470: 198-203. NA = not applicable

Page 41: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Applications of NGS technologies

Application Objective

Complete genome resequencing Discovery of variants among individuals

Exome sequencing Resequencing of specific regions within a genome

RNA-seq Sequencing of transcripts and quantification of gene expression levels

ChIP-seq Genome-wide mapping of protein-DNA interactions

Paired-end sequencing Discovery of structural variants

Metagenomics Sequencing of all the genomes in a given ecological habitat

Page 42: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Reference mapping De novo assembly

Assembly of reads by similarity with a reference genome sequence

Assembly of reads based on the overlap of their sequences

Comparison of each read with the reference sequence

Comparison of each read with all the other reads

New sequences and structural rearrangements can NOT be detected

New sequences and structural rearrangements CAN be detected

Not applicable the first time a genome is sequenced

More complex, slower and requires more computational resources

AGTATTAGCCTTAAGAAATTAAGTATTAGAA

AAAAA CTTAAGAAATTAAGTATTAGAA

AAAAATTAAGTATTAG AAGTATTAGAA

AAAAATTAAGTATTAGCCTTA TTAGAA

AAAAATTAAGTATTAGCCTTAAGAAAT

GCCTTAAGAAATTAAGTATTAGAA

AAA AAATTAAGTATTAGAA

AAAAATTAA AAGTATTAGAA

AAAAATTAAGTATTAG

AAAAATTAAGTATTAGCCTTAAG

Complete genome sequencing – Assembly

Page 43: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

Targeted capture – Exome sequencing

http://www.lcsciences.com/applications/genomics/specialty-genomics-microarrays/specialty-genomics-microarrays-application-notes/

Probes

Figure 1. Gnirke et al. (2009) Nature Biotechnology 27:182-189

Array hybridization Solution hybridization All exons

microRNAs

Candidate regions

Page 44: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

RNA-seq

AAAAA

Figure 1. Wang et al. (2009) Nature Reviews Genetics 10: 57-63

Sequencing of all the transcripts in a sample using NGS technologies

Page 45: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

RNA-seq mapping of short reads in exon-exon junctions

CCGAAAATCAAGTCATCCCTAAAGACTAAGTAAGTAACCATATTACATTAAGGAAGGCACTTTAAAAGTTTATAATCATTTGTAGACTCCCACCAAAGCCACTGACTCGCAAGG

Exon Exon Intron

RNA-seq

Page 46: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

ADVANTAGES Independence of the existence of an available genomic sequence

Detection of new transcripts

Single-nucleotide precision

Detection of splicing variants and alternative transcription starts and ends

Detection of SNPs in transcribed regions

Detection of allele-specific transcription

Accurate quantification of expression levels (wide range of measurements)

Great reproducibility

Small amount of initial RNA needed

RNA-seq

Page 47: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe

ChIP-seq

Figure 1. Massie and Mills (2008) EMBO reports 9: 337-343. Figure 2. Park (2009) Nature Reviews Genetics 10: 669-680.

Chromatin immunoprecipitation (ChIP) + Sequencing

Detection of transcription factor binding sites and other DNA-protein interactions

Page 48: Genomic techniquesbioinformatica.uab.cat/base/documents/masterGP/Genomic... · 2013-05-15 · 3 µm beads coated with probes are located in microwells on the array surface Each probe