nescent : ngs : measuring expression jen taylor bioinformatics team csiro plant industry

78
NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

Upload: tracey-kennedy

Post on 23-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

NESCENT : NGS : Measuring expression

Jen Taylor

Bioinformatics Team

CSIRO Plant Industry

Page 2: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression• What & Why

• What is expression and why do we care?

• How• Platforms / Technology

• Closed approaches – Microarray• Open approaches - Sequencing

• Experimental Design

• Analysis• Biases• Bioinformatics• Statistical Issues and Analysis

• In action• Workshop – Detection of Differential Expression• Case Studies in Plant functional genomics

Page 3: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

What is expression / transcriptome ?

mRNA

rRNAtRNA

siRNAmicroRNA

piRNA

tasiRNA lncRNA

DNA

Page 4: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster)

Gonville & Caius College, Cambridge, UK.

Beyond the Genome:

1995

Human Genome sequencing begins in earnest

“Mapping the Book of Life”

2000 - First Draft

2003 - Essential Completion

= approx 140, 000 genes

= 30, 000 – 40,000 genes ??

= 24, 195 genes !!!???

Page 5: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

“The failure of the human genome”

“despite more than 700 genome-scanning publications and nearly $100bn spent, geneticists still had not found more than a fractional genetic basis for human disease “

Manolio et al., Nature, 2009

“The most likely explanation for why genes for common diseases have not been found is that, with few exceptions, they do not exist.

…., if inherited genes are not to blame for our commonest illnesses, can we find out what is? “

Guardian, 2011

Page 6: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Commemorative stained glass window for F.C. Crick, designed by Maria McClafferty.(Photograph: Paul Forster)

Gonville & Caius College, Cambridge, UK.

Beyond the Genome:

Gene Number ≠ Complexity

Co

mp

lexityRegulation

Gene

Transcriptome

Page 7: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Why the expression ?

High-throughput friendly

Context dependent

Regulatory

network

Predicts Biology

Transcriptome

Genome

Proteome

**Li et al., 2004

**

Page 8: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression ?

Parts Description• Function?

• Interconnectedness?

Comparisons• Population - level• Between genomes

Page 9: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression ?

What are important members of a transcriptome?

mRNA• polyadenylated, coding• alternatively spliced

Noncoding RNA (small RNA)• varying lengths, functions (18 – 32 bases)• microRNA, siRNA, piRNA, tasiRNA, long non-coding RNA

“Dark” RNA• transcription outside of annotated genes • Non-polyadenylated

Anti-sense transcription

Page 10: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression ?

How does the transcriptome vary to give rise to phenotype ?

Changes in Abundance• Abundance = Rate of Transcription – Rate of Decay

Changes in Function• Availability for function – polyadenylation, silencing, localisation• Suitability for function – alternate splicing

Page 11: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

How to measure Expression

PLATFORMS / TECHNOLOGY

Page 12: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression : platforms

• Closed systems – microarray• Probes immobilised on a substrate profile target species in the

transcriptome

Page 13: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Page 14: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Single and two colour arrays

Labelling

Two colour

Control

Experimental

Probe Library

Array

Labelling

Single colour

Sample A

Array Manufacture

Hybridisation

Scanning

Page 15: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Array profiling

Affymetrix Array Targets

• Arabidopsis Genome 24,000

• C. elegans Genome 22,500

• Drosophila Genome 18, 500

• E. coli Genome 20, 366

• Human Genome U133 Plus 47,000

• Mouse Genome 39, 000

• Yeast Genome

• S.cerevisiae 5, 841

• S. pombe 5, 031

• Rat Genome 30, 000

• Zebrafish 14, 900

• Plasmodium / Anopheles

• P. faciparum 4,300

• A. gambiae 14,900

• Barley (25,500), Soybean (37,500 + 23,300 pathogen), Grape (15,700)

• Canine (21,700), Bovine (23,000)

• B.subtilis (5,000), S. aureus (3,300 ORFS), Xenopus (14, 400)

Page 16: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Page 17: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Page 18: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Closed System – Microarray

• Pros• High-throughput

• Targeted profiling

• Inexpensive – “population friendly”

• Analytical methods are standardised

• Negative• “Closed system” , novel = invisible

• Difficult to see allelle-specific expression

• Biases due to hybridisation• SNPs• Competitive and non-specific hybridisation

Page 19: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Open systems – RNA Sequencing

Technology:• Illumina• SOLiD, IonTorrent• 454

Pros:• Transcript discovery• Allelic expression• High resolution abundance measures

Cons:• Analysis can be complex• Expensive• Sensitivity is sequencing depth dependent

Page 20: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNA Sequencing

Mortazavi et al., 2008

Page 21: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq - Correspondence

• Range > 5 orders of magnitude

• Better detection of low abundance transcripts

Marioni et al., 2009

Page 22: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Platform Choice / Sample Preparation Choice

What do you want to profile ?

• Polyadenylated• PolyA RNA extraction

• Small RNA (< 100 bases)• Size filtering by gel

• Strand-specific

• RNA – Protein Interactions• RNA Immunoprecipitation (IP)

Page 23: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq - Workflow

Library Construction

Sample

Total RNA

PolyA RNA

Small RNA

Sequencing

Base calling & QC

Mapping to Genome

Assembly to Contigs

Differential Expression

SNP detection

Transcript structure

Secondary structure

Targets or Products

Page 24: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Illumina RNASeq : TruSeq

Page 25: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Small RNA sequencing

Small RNA

25

75

110

smallRNA separation: PAGE

small RNA < 35bp

134

Page 26: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Strand - specificity

Using adaptors Using chemical modification

SMART : addition of C’s on 5’ end

Ligation : 3’ and 5’ adaptors added sequentially

Levin et al., 2010

dUTP : Addition and removal after selection

Page 27: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression Levin et al., 2010

Page 28: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Non-polyA methods

• Total RNA extraction

• Ribosomal RNA and tRNA > 95-97% of total RNA

• Ribosomal reduction methods• Subtractive hybridisation with rRNA probes

• Exonuclease cleave of rRNA

• NuGen – “proprietary combination of reverse transcriptase and primers in the Ovation RNA-Seq System”

• cDNA normalisation methods• Partial digestion of any highly abundant species (Evrogen)

Page 29: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Platform Choice / Sample Preparation Choice

What do you want to profile ?

• Polyadenylated• PolyA RNA extraction

• Small RNA (< 100 bases)• Size filtering by gel

• Strand-specific

• RNA – Protein Interactions• RNA Immunoprecipitation (IP)

• Non - PolyA• rRNA reduction

Page 30: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

EXPERIMENTAL DESIGN and ANALYSIS

Page 31: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

• Issues:• sequencing depth - how much ?

• number of replicates – how many ?

• Aims of the data : • Transcriptome assembly / transcript characterisation

• Maximise depth

• Detection of differential expression (denovo or reference)

• Balance depth and replication

CSIRO. Sequencing Depth V.S. Number of Replicates

RNASeq Experimental Design

Page 32: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Sequencing Depth V.S. Number of Replicates

Defining Replicates

• Technical Replicates • Biological Replicates

Library 1

Lane 1

Individual

Library 2

Lane 2 Lane 3 Lane 4 Lane 1

,Individual 1

Lane 2

Individual 2

Library 1 Library 2

Depth = 2 x 100% lane / sample 100% lane / sample

Lane 1

Library 4

Multiplex

Library 3

Library 2

Library 1

L1

L2

L3

L4

25% lane / sample

Page 33: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Sequencing Depth V.S. Number of Replicates

Page 34: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Sequencing Depth V.S. Number of Replicates

Coverage Depth

Page 35: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Sequencing Depth V.S. Number of Replicates

Number of Replicates

edgeR <= 0.01 , DESeq <= 0.01

More information in biological replicates than depth

For differential expression

# Rep

s

2 4 6 8 10 12

False P

0.03 0.03 0.03 0.03 0.03 0.03

False N

0.84 0.72 0.64 0.59 0.54 0.50

True P

0.16 0.28 0.36 0.41 0.46 0.50

True N

0.97 0.97 0.97 0.97 0.97 0.97

Page 36: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq Analysis

• Overall Aim :• To get an accurate measurement of transcript abundance, structure

and identity

• Biases and Compositions

• Alignment• TopHat / Cufflinks

• Assembly• ABySS

Page 37: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Assumptions

Every transcript / k-mer has equal chance of being sequenced

No. sequences observed ≈ transcript abundance

Gene A = z Reads / million Gene B = y Reads / million

z = 2 x y

Gene A > Gene B

Page 38: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Length Bias

Oshlack and Wakefield, 2009

Page 39: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Alignment Bias

Page 40: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Alignment Bias

Page 41: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Sequencing Bias

Hansen et al., 2010

Page 42: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Bias

Every transcript / k-mer has equal chance of being sequenced

No. sequences observed ≈ transcript abundance

Gene A = z Reads / million / kb Gene B = y Reads / million / kb

Weighting schemas (e.g. Cufflinks) :

• Mapability

• kmer / fragment frequencies

Page 43: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Bias

Every transcript / k-mer has equal chance of being sequenced

No. sequences observed ≈ transcript abundance

Gene A1 = z Reads per million Gene A2 = y Reads per million

z = 2 x y

Sample A vs Sample B

Page 44: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Read density variability

Page 45: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq – Compositional properties

Depth of Sequence• Sequence count ≈ Transcript Abundance

• Majority of the data can be dominated by a small number of highly abundant transcripts

• Ability to observe transcripts of smaller abundance is dependent upon sequence depth

• Fixed budget of reads

Page 46: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

A simple example – compositional bias

AA

BB

sample II

Sequencing budget / depth: 4000 reads

AA

DDCCBB

sample IExpected counts

1000

1000

1000

1000

2000

Expected counts

2000

Page 47: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Soil diversity by phylogenetic analysis - Phylum level

C

B

A

Recognized bacterial phyla

0% 20% 40% 60% 80% 100%

% distribution

454-sequence analysis of bacterial 16S rRNA gene~410,000 sequences

A. Richardson, CSIRO

Page 48: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq Bioinformatics Analysis

• Aims:• To get an accurate measurement of transcript abundance,

structure and identity

• Biases and Compositions• Relative abundances NOT absolute

• Alignment• TopHat

• Assembly• ABySS

Page 49: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNA Sequencing analysis

Sequence Data

Alignment

Read Density

Differential Expression

SNPs

Transcript Characterisation

Assembly

Contigs

Genome?

Page 50: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq – Alignment Considerations

Reads with multiple locations

• Discard / Random Allocation

• Clustering - local coverage

• Weighting

Reads Spanning Exons

• Make and align to exon junction libraries

• Denovo junction detection

Summarisation of counts

• Exons

• Transcript boundaries

• Inferred read boundaries

Page 51: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

TopHat

Trapnell et al., 2009; Roberts et al., 2011

Multimapping : ≤10 sites

Assembly : consensus ‘island’ exon

Page 52: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

TopHat / Cufflinks

Trapnell et al., 2009; Roberts et al., 2011

Heuristics :

• “Correct” errors in low coverage areas

• Grabs 45 bp either side of islands to capture splice sites

• Collapse small islands

• Looks for junctions within larger islands, highly covered

Cufflinks :

• calculates the probability of observing a certain fragment within a given transcript given surrounding fragments.

Page 53: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Alignment

• Great if you have a fully annotated, reference

• Okay.. If you have a partially annotated reference

• “Different” if you have a big bunch of ESTs

Options:• Align to a neighbouring genome or EST library• Denovo transcriptome assembly

Tools:• ABySS, Mira, Trinity, HT-Seq, SAMtools

Page 54: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNA Sequencing analysis

Sequence Data

Alignment

Read Density

Differential Expression

SNPs

Transcript Characterisation

Assembly

Contigs

Genome?

Page 55: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Denovo transcriptome assembly

• ABySS• MIRA• Trinity• Velvet• AllPaths• Soap-denovo• Euler• CABOG• Edena• SHARCGS• VCAKE• SSAKE• CAP3

• Will run on reasonable computer resources for large genomes

• (e.g. < 1 TB of RAM)

• Paired end data handling

• Platform flexible

• Handles haplotype complexity and polyploid genomes

Page 56: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Denovo transcriptome assembly

• ABySS• MIRA• Trinity• Velvet• AllPaths• Soap-denovo• Euler• CABOG• Edena• SHARCGS• VCAKE• SSAKE• CAP3

• Will run on reasonable computer resources for large genomes

• (e.g. < 1 TB of RAM)

• Handles paired end data

• Handles data from all platforms

• Handles haplotype complexity and polyploid genomes

Page 57: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Assembly – Kmer graphs

K = 4

Miller et al., 2010

Page 58: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Assembly – Kmer graphs

Spurs

• Sequencing error

Bubbles

• Sequencing error

• Polymorphism

Frayed Rope / Cycles

• Repeats

Miller et al., 2010

Page 59: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Assembly – Kmer graphs

Spurs

• Sequencing error

Bubbles

• Sequencing error

• Polymorphism

Frayed Rope / Cycles

• Repeats

Miller et al., 2010

Page 60: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

ABySS & TransABySS

• User specifies k

• Optimal k depends on sequencing depth

Page 61: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

ABySS & TransABySS

• Sequencing depth is relative to transcript abundance• Iterate over multiple k and merge

• Contigs contained within a large contig are “buried”

Page 62: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Assessing assembly quality ?

• Comparisons between assembly algorithms• Contig summary statistics• Comparisons to known resources (e.g. ESTs)

Trial on Rice Transcriptome:• 120 Million 75 bp single end Illumina reads – embryo

• ABySS :• Number of contigs = 6, 804• Contig length range = 38 – 2,818 [mean = 203]

• Database comparisons :

• Rice public cDNA sequences : 67, 393

• Contigs with high quality matches to cDNA : 6,555 (96%)

Page 63: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq Bioinformatics Analysis

• Aims:• To get an accurate measurement of transcript abundance,

structure and identity

• Biases and Compositions• Relative abundances NOT absolute

• Alignment

• Assembly

Page 64: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

STATISTICAL ISSUES

Page 65: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression – Statistical Issues

• Data elements

• Normalisation

• Detection of Differential Expression

Page 66: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Count Data : of what ?

Page 67: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Count Data : of what ?

Garber et al., 2011

Page 68: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Statistical analysis of RNASeq

• Count data• Distribution is positively skewed, not normal• Between sample variability in counts - normalisation

Page 69: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Normalization is required

Two scenarios :

1. Different sizes of total reads (library size)

2. Fixed library size, subset of highly expressed reads in 1 sample.

Both reduce sequencing budget available for the majority of transcripts

Page 70: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Normalisation

• Assume the majority of log ratios = 0 [No change]

Robinson and Oshlack, 2010

TMM : Trimmed Mean of M values (log ratios)

Adjust TMM to be equal between samples

Page 71: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

DE genes with and without TMM normalization

Page 72: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq data – Poisson Distributions

• Poisson distributions are used when things are counted

• The probability of seeing n events in a fixed time or space

• The number of lions on a 1 day safari

• The number of raindrops on a tennis court

• The number of flying elephants in a year

• Requires λ : rate of events• Variance = mean = λ

Page 73: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq data – Negative Binomial

• RNASeq data is more variable than Poisson• Variance > mean = λ

• Less prominent for large mean

• Over-dispersed Poisson

Noise types• Shot noise

• Unavoidable, prominent for low mean

• Technical noise• Small, hopefully, can be managed

• Biological noise• Sample differences

Page 74: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNA Seq

• Variance also depends on the mean

Anders, 2010

Page 75: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq Model

The total counts for a transcript in sample j from condition c :

cjcj vss 2

Library normalisation

Mean Value Fitted Variance (overdispersion)

For a given gene , test for a difference in counts between conditions.

Is mean c1 + mean c2 statistically different to mean c1 + mean c1?

Page 76: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

RNASeq DE Testing

• DESeq – Anders and Huber, 2010• EdgeR – Robinson et al., 2009 – R• BaySeq – Hardcastle and Kelley, 2010 – R• DEGSeq – Wang et al., 2010 – R• NBP - Di et al., 2011

• LOX – Zhang et al., 2010• Infers expression measures allowing for incorporation of noise from

different methodologies in the one experimental design

Page 77: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

CSIRO. Nescent August 2011 - Measuring Expression

Measuring Expression• What & Why

• What is expression and why do we care?

• How• Platforms / Technology

• Closed approaches – Microarray• Open approaches - Sequencing

• Experimental Design

• Analysis• Biases• Bioinformatics• Statistical Issues and Analysis

• In action• Workshop – Detection of Differential Expression• Case Studies in Plant functional genomics

Page 78: NESCENT : NGS : Measuring expression Jen Taylor Bioinformatics Team CSIRO Plant Industry

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Thank you

Plant IndustryJennifer M TaylorBionformatics Leader

Phone: +61 2 62464929Email: [email protected]

Acknowledgements

Jose RoblesStuart StephenHua YingAndrew Spriggs

Alexie Pa

NESCENT Funding