microarrays and rnaseq: technologies and data processing€¦ · homemade arrays: pcr products or...

72
Department of Animal Sciences Department of Biostatistics & Medical Informatics University of Wisconsin - Madison Guilherme J. M. Rosa Microarrays and RNAseq: Technologies and Data Processing

Upload: others

Post on 18-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Department of Animal Sciences Department of Biostatistics & Medical Informatics

University of Wisconsin - Madison

Guilherme J. M. Rosa

Microarrays and RNAseq: Technologies and Data Processing

Page 2: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

OUTLINE

Æ Introduction

w Central dogma of molecular biology

Æ Transcriptional Profiling Technologies

w Earlier methods

w Microarrays

w RT-PCR

w RNA-Seq

Æ Data Acquisition and Data Pre-processing

Page 3: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

CENTRAL DOGMA OF MOLECULAR BIOLOGY

Page 4: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color
Page 5: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Environment

Genetics

Phenotype Gene expression

Page 6: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Ø  Southern blotting and Northern blotting Ø Microarrays Ø  RT-PCR Ø  RNAseq

GENE EXPRESSION ASSAY TECHNOLOGIES (TRANSCRIPTION LEVEL):

Page 7: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Detection of specific DNA fragments by gel-transfer hybridization 1)  The mixture of double-stranded DNA fragments generated by restriction nuclease

treatment of DNA is separated according to length by electrophoresis. 2)  A sheet of either nitrocellulose paper or nylon paper is laid over the gel, and the separated

DNA fragments are transferred to the sheet by blotting. 3)  The gel is supported on a layer of sponge in a bath of alkali solution, and the buffer is

sucked through the gel and the nitrocellulose paper by paper towels stacked on top of the nitrocellulose.

4)  As the buffer is sucked through, it denatures the DNA and transfers the single-stranded fragments from the gel to the surface of the nitrocellulose sheet, where they adhere firmly. (This transfer is necessary to keep the DNA firmly in place while the hybridization procedure is carried out).

5)  The nitrocellulose sheet is carefully peeled off the gel. 6)  The sheet containing the bound single-stranded DNA fragments is placed in a sealed

container together with buffer containing a radioactively labeled DNA probe specific for the required DNA sequence.

7)  The sheet is exposed for a prolonged period to the probe under conditions favoring hybridization.

8)  The sheet is removed from the container and washed thoroughly, so that only probe molecules that have hybridized to the DNA on the paper remain attached.

9)  After autoradiography, the DNA that has hybridized to the labeled probe will show up as bands on the autoradiograph.

SOUTHERN BLOTTING

Page 8: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SOUTHERN BLOTTING

•  An adaptation of this technique to detect specific sequences in RNA is called Northern blotting. In this case mRNA molecules are electrophoresed through the gel and the probe is usually a single-stranded DNA molecule.

•  Northern blots allow investigators to determine the molecular weight of an mRNA and to measure relative amounts of the mRNA present in different samples.

Page 9: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Microarrays use the natural chemical attraction between DNA and RNA molecules to determine

the expression level of genes

C pairs with G and A pairs with T or U

A good match sticks, a bad match doesn't

MICROARRAY TECHNOLOGY

Page 10: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

MICROARRAY TECHNOLOGY

Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color system

Affymetrix: High-density oligonucleotide array, short (25-mer) oligos synthesized in situ (photolithography); single-channel

Agilent (HP): pre-synthesized oligonucleotide (60-mer) probes are printed using inkjet technology; two-color system

Illumina: pre-synthesized oligonucleotide (50-mer); single-channel, multiple arrays (6 or 8) per slide

NimbleGen: pre-synthesized oligonucleotide (60-mer); multiple probes per gene; 4-plex (4 samples per array)

Variations: kind of probe (PCR product or oligos), length of oligos, how probes are deposited on slide, pre- or in situ synthesized,

number of samples co-hybridized in each slide

TWO COLOR VS. SINGLE CHANNEL SYSTEMS

Page 11: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SINGLE CHANNEL

TWO COLOR SYSTEMS

MICROARRAYS PLATFORMS

Page 12: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Multi-step process to extract RNA from the sample and make millions of copies.

Chop up the RNA

At the same time the RNA is copied, molecules of a chemical called biotin (orange cups) are attached to each strand. These biotin molecules will

act as a molecular glue for fluorescent molecules.

Page 13: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Wash Sample Over the Array

Page 14: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Fluorescent stain that sticks to the biotin

Amount of fluorescent stain proportional to the amount of

RNA molecules that hybridized to the DNA probe

Page 15: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color
Page 16: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Comparing Gene Expression between “loud speakers” and “normal

speakers”

Page 17: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Biological question"Differentially expressed genes"Sample class prediction etc."

Testing"

Biological verification "and interpretation"

Microarray experiment"

Estimation"

Experimental design"

Image analysis"

Normalization"

Clustering" Discrimination"

Page 18: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Microarray Technology

Two-color systems

Page 19: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

MICROARRAY TECHNOLOGY TWO-COLOR PLATFORMS (cDNA or LONG OLIGOS)

Page 20: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

An Actual Gene Expression Image

4 x 12 patches (print tips)

19 x 19 spot / patch Example:

IMAGE ANALYSIS

Page 21: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

STEPS IN IMAGES PROCESSING

3. Information extraction: for each spot of the array, calculates signal intensity pairs, background and quality measures.

1. Addressing: locate centers.

2. Segmentation: classification of pixels either as signal or background (using seeded region growing).

Some image analysis software: ArrayWorx, Dapple, GenePix, ImaGene, ScanAlyse, Spot, UCSF Spot, etc..

Page 22: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SEGMENTATION METHODS

•  Fixed circles

•  Adaptive Circle

•  Adaptive Shape –  Edge detection –  Seeded Region Growing

•  Histogram Methods –  Adaptive threshold

•  Clustering algorithms –  Robust to “sickle-cell”, “donut-shaped” spots.

Seeded Region Growing (Yang et al., 2002)

Fixed Circle

Page 23: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SOME LOCAL BACKGROUNDS

GenePix

QuantArray

ScanAnalyze

GeneTAC LS IV

Background adjustment method more important than segmentation (Yang et al., 2002)

Page 24: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

QUANTIFICATION OF EXPRESSION

ð For each spot on the slide may calculate:

Red intensity = Rfg - Rbg

Green intensity = Gfg - Gbg

(fg = foreground, bg = background)

ð And combine them in the log (base 2) ratio:

Log2( Red intensity / Green intensity)

Page 25: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Data pre-processing (Normalization)

Page 26: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

(Geschwind, 2001)

Sources of Variability in Microarray Experiments

Biological heterogeneity

Specimen collection/Handling effects

Biological Heterogeneity within Specimen

RNA extraction/amplification

Fluor labeling

Hybridization

size/shape of spot

sample distribution across slide

Scanning (Voltage/power/software)

Page 27: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

M-A Plot (log intensity ratio vs. mean log-intensity)

DYE-INTENSITY BIAS

M = log(Cy3/Cy5) = log(Cy3) - log(Cy5)

M

A

0 IDEAL

SITUATION

A = = [log(Cy3) + log(Cy5)]/2 Cy5Cy3log ×

Low intensities High intensities

Cy3 < Cy5

Cy3 > Cy5

Page 28: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

M-A Plot

DYE-INTENSITY BIAS

Page 29: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

LO(W)ESS Locally-Weighted Regression and

Smoothing Scatterplots

Basic Idea: � For x= x0, specify a neighborhood

� Weighted least squares to fit linear or quadratic functions at x0

certain radius

smoothing parameter (measured as a percentage

of the data points)

(by a decreasing function of the distances from x0)

(center of the neighborhood)

Page 30: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

M-A Plot

M

A

0

Fitted curve

Page 31: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Normalization

Æ Normalized intensities

2MA)3Cylog(*

* +=2MA)5Cylog(*

* −=

Adjusted Values

and

M̂MM* −=

Normalized M Loess-predicted M

Page 32: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Normalization

Æ LOESS (Local Regression)

Before After

Page 33: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

M-A plots for two arrays (each column), before and after quantile normalization (each row).

Large points represent 20 spiked-in probes

Irizarry et al. (2003)

Page 34: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

PRINT-TIP SPECIFIC

NORMALIZATION

SCALE NORMALIZATION Æ Some scale adjustments may be required so that the relative expression levels from one particular experiment (slide) do not dominate the average relative expression levels across replicate experiments.

(Yang et al., 2002)

Page 35: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

WITHIN-SLIDE SCALE NORMALIZATION Scale normalization across print-tips

I

I

k

n

jij

n

jij

ii

i

M

Ma

∏∑

= =

==

1 1

2

1

2

ˆ

) ,0(~ 22σiij aNM

i

ijij a

MM

ˆ* =

jth log-ratio in the ith print-tip

Estimate of the scale factor ai

Normalized values

Page 36: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SPATIAL EFFECTS ON SLIDE

Top 2.5% of ratios red, bottom 2.5% of ratios green

Page 37: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SPATIAL EFFECTS ON SLIDE

Page 38: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

………………………………………………

SPATIAL EFFECTS ON SLIDE

For each patch (or print tip), visualize intensities of control genes or use “ robust” measures as median and trimmed means.

Trimmed mean ( ): mean after eliminating the 100.α% of

the smallest and biggest values. αx

i.e., mean of the 100.(1-2α)% of middle numbers.

Page 39: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

SPATIAL TREND (‘Median filter’) (Wilson et al., 2003)

Æ Median log ratio over spatial neighborhood of each spot

Æ Patch (print-tip) within array, and spot within patch may be included into the model for the analysis of the data (we’ll see it later)

spatial neighborhood (e.g. 3 × 3 block)

Page 40: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Wilson et al. (2003)

black: low expression flagged spots

Row 1: M-A plot and Image of log(R/G) spot values

Row 2: Same after Loess normalization

Row 3: Same after pin normalization

Page 41: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Array 5: Before and after housekeeping

normalization

Before and after spatial normalization (rows 1 and 2); four

arrays (each column)

Wilson et al. (2003)

Page 42: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Microarray Technology

High Density Oligonucleotide Arrays

Page 43: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

MICROARRAY TECHNOLOGY Illumina BeadChip Technology

Page 44: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

MICROARRAY TECHNOLOGY Affymetrix Genechip® Gene Expression Microarrays

Page 45: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

In situ Oligonucleotide Syntesis: Photolithography

Page 46: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

22 different probes for each gene (11 pairs of PM-MM)

Page 47: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

An Actual Gene Expression Image

Page 48: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

∑=

−=P

1igigig )MMPM(

P1sSignal (expression index):

Page 49: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

EXPRESSION INDEX

Æ SOME ALTERNATIVE APPROACHES

• Model Based Expression Index (MBEI): Li and Wong (2001)

• MAS 5.0 Statistical Algorithm: Affymetrix (2001)

• Robust Multichip Average (RMA): Irizarry et al. (2003)

“average difference” or “signal”

∑=

−=J

1jgjgjg )MMPM(

J1y

Page 50: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

QUANTILE NORMALIZATION (Bolstad et al., 2003)

� where probe intensities for array i

� Sort each column of X to give Xsort

� Take the means across rows of Xsort and assign their values to each element in the row to get X*

� Get Xnormalized by rearranging each column of X* to have the original ordering as in X

] ,, ,[X nn1np xxx …=

:ix

Æ Goal: to make the same distribution of each array

Page 51: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

QUANTILE NORMALIZATION (Bolstad et al., 2003)

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

npp2p1

2n2212

1n2111

xxx

xxxxxx

X

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

)p(n)p(2)p(1

)2(n)2(2)2(1

)1(n)1(2)1(1

sort

xxx

xxxxxx

X

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=

ppp

222

111

*

xxx

xxxxxx

X

⎥⎥⎥⎥

⎢⎢⎢⎢

=

66538946

25127578

10333210

norm

xxx

xxxxxx

X

� �

� �

Page 52: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

CYCLIC LOESS (High Density Oligonucleotide Arrays)

(Bolstad et al., 2003)

Æ M-A plot from two arrays at a time

Æ Normalization (LOESS) carried out in a pairwise manner

Æ Adjustments for each of the arrays in each pair are recorded

Æ For any array k the adjustments relative to arrays 1, … , k – 1, k + 1, … , n are weighted and applied to array k

• Generally only 1 or 2 complete iterations suffice

Page 53: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

qRT-PCR Technology

Page 54: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

PCR

Page 55: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

RELATIVE QUANTIFICATION (COMPARATIVE CT)

CT

Page 56: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Estimating Fold Change Between Two Observations

If E = 1:

If EX ≠ ER ≠ 1:

T

T

TC

cb,C

q,C

cb,S

q,S )E1()E1(K)E1(K

XX

FC ΔΔ−Δ−

Δ−

+=+

+==

TC

cb,S

q,Scb,q 2

XX

FC ΔΔ−==

( )[ ]

( )[ ])R(C)R(CR

)X(C)X(CX

cb,S

q,S

qTcbT

qTcbT

E1E1

XX

FC −

+

+==

Page 57: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

COMMENTS

Ø  ΔΔCT methods lack generality for statistical analyses of hierarchically replicated qRT-PCR data

Ø  Linear mixed models are more appropriate for the analysis of relative quantification RT-PCR data *

* Steibel JP, Poletto R, Coussens PM and Rosa GJM. A powerful and flexible linear mixed model framework for the analysis of relative

quantification RT-PCR data. Genomics 94: 146-152, 2009.

Page 58: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

RNAseq Technology

Page 59: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

FIRST GENERATION SEQUENCING (Sanger, 1974)

Page 60: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

The Sanger Method

1.  Create an entire

sequence of nested sub

fragments including the

original fragment

2.  Figure out which base

each fragment ends with

Page 61: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

4 tubes and using a gel

Automation of the Sanger method

Ø  Fluorescently labeled dideoxynucleotides

Ø  The gel is “read” by a fluorimeter and the data are stored in a computer file

Page 62: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Ø  Takes advantage of miniaturization to engage in

massively parallel analysis

Ø  Applications: whole genome sequencing, RNA-

Seq, ChIP-Seq, etc.

Ø  Anything we can do with microarrays, we can

probably do better with sequencing techniques

NEXT GENERATION SEQUENCING

Page 63: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

DIFFERENT PLATFORMS

Ø  Read length

Ø  Number of reads

Ø  Total throughput (size of the data)

Ø  Time for the analysis

Ø  Costs

Illumina Solexa System

454 Roche ABI SOLID

Helicos Bioscience

Page 64: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

RNA-Seq TECHNOLOGY

Page 65: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Measuring transcriptomes with RNA-Seq

Page 66: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Two key concepts related to RNA-Seq

Paired-end sequencing 220 bp

Two reads 80 bp

150 bp

Genome

Transcriptome

Reads

Mapping

Splice junction fragments

Page 67: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Tasks with RNA-Seq data

• Differential expression: Given: RNA-Seq reads from two different samples and transcript sequences Do: Predict which transcripts have different abundances between the two samples

• Assembly: Given: RNA-Seq reads (and possibly a genome sequence) Do: Reconstruct full-length transcript sequences from the reads

• Quantification: Given: RNA-Seq reads and transcript sequences Do: Estimate the relative abundances of transcripts (“gene expression”)

Page 68: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Advantages of RNA-Seq over Microarrays

1.  No reference sequence needed

2.  Low background noise

3.  High technical reproducibility

4.  Larger dynamic range of expression levels

5.  Analysis of alternative splicing

6.  Identification and characterization of novel transcripts

VS

Page 69: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

RNA-Seq Computational Pipeline

Page 70: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Ø  The number of reads (counts) mapping to the biological feature of

interest (e.g. gene, transcript or exon) is considered to be linearly

related to the abundance of the target feature

Gene counts depends on: ü  sequencing depth ü  gene length

ü  expression level

NORMALIZATION

MEASURING EXPRESSION

Page 71: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

Problem: Need to scale RNA counts per gene to total sample coverage

Solution: Divide counts per million reads

Problem: Longer genes have more reads, gives better chance to detect DE

Solution: Divide counts by gene length

Ø  Differential Expression requires comparison of 2 or more RNA-Seq samples

Ø  Number of reads (coverage) will not be exactly the same for each sample

Normalization method

1000000 1000

number of reads of the regiontotal reads region lengthRPKM =

×

Counts are divided by the transcript length (kb) times the total number of

millions of mapped reads: reads per kb per million read sequenced

Page 72: Microarrays and RNAseq: Technologies and Data Processing€¦ · Homemade arrays: PCR products or pre-synthesized oligonucleotides probes are spotted using robot technology; two-color

ANALYSIS OF DIFFERENTIAL EXPRESSION

Ø  Parametric approaches: Counts modeled using

known probability distributions such as Binomial,

Poisson or Negative Binomial; Linear model

methodology (Gaussian approximation) for

transformed data, e.g. log transformation

Ø Non-parametric approaches: Chi-squared based

approaches; Fisher’s exact test; Sampling-based

approaches