alistair chalk, 2008 gene expression goals to understand current high throughput strategies for...

49
Alistair Chalk, 2008 Gene Expression Goals • To understand current high throughput strategies for measuring gene expression • To understand quality control and normalisation in gene expression data • To understand the factors behind choice of gene expression measurement strategies • To identify downstream analysis methods for gene expression data

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Goals• To understand current high throughput strategies for measuring

gene expression

• To understand quality control and normalisation in gene expression data

• To understand the factors behind choice of gene expression measurement strategies

• To identify downstream analysis methods for gene expression data

Page 2: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Microarrau analysis workflow

• Standard microarray analysis workflow

Page 3: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Data Collection• How do you measure the ‘transcriptome’??

• Arrays• Affymetrix (3’ arrays)• Illumina• cDNA arrays• Boutique arrays

• Deep sequencing• 454 (Roche)

• Solexa (Illumina)

• SOLiD (ABI)

• Helicos single cell sequencing

• More to come, $1000 genome is the goal

Page 4: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Data Collection• All these platforms (arrays) generate large amounts of data.

• Arrays: 50 000 to 70 000 data points (500 000 at probe level)

• Large numbers of samples or patients increase this, 10, 50, 100 fold.

• Data Analysis• Single comparisons, gene by gene, sample by sample.

• Clustering of genes or samples - common patterns or trends.

• Multivariate approach, identifying combinations of genes – biomarkers, gene ‘signatures’.

Page 5: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Affymetrix arrays• Arrays for every model organism .. And more.

• Human, mouse, rat, chicken, zebrafish, Drosophila, C.elegans, Maize, Arabidopsis.

• Are really 3’ UTR arrays! .. As most probes are placed here for specificity.

• About 50-80K probesets per array.

• More than 0.5 million probes.

• Reproducible results, standard procedures.

Page 6: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Illumina ‘bead’ arrays

• Similar attributes to Affy arrays

• More even hybridisation, as this occurs in solution.

Page 7: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Affymetrix exon arrays• Arrays for every every predicted exon on the

genome• Human, mouse, rat so far

• 1.4 Million probesets

• More probes, but noisier

• Probe set regions have variable length

• Short probe regions (~exons) make probes harder to design

Page 8: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Affymetrix exon junction arrays• Arrays for every predicted exon-junction on

the genome• Experimental – release late 2008

• Potentially noisy – junctions are quite similar so cross hybridisation an issue

Page 9: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

cDNA arrays• Older technology.

• Low density

• Issues with evenness of spots.

• Important to control for technical variation within the array (blocks)

• Messy dye swaps used to control for labelling variations

• Plagued by spatial effects and inconsistencies.

Page 10: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Boutique arrays• New array manufacturers (Nimblegen, etc) allow custom arrays

to be made.

• Suitable for focused approaches where the knowledge from research is turned into an array to survey a particular process or signature.

• Examples• Alternative splicing (transcription)

• Human promoters (CHIP-ChIP)

• Genes expressed in a particular cancer subtype

• miRNA

• How to analyse these datasets?

• Which method do I use to measure sample?

Page 11: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• ‘Deep’ sequencing• Fantastic advances in

sequencing technologies.

• Shotgun approach.

• Millions of ‘reads’ possible.

• Profiling communities.

• Metagenomics

• Analysis of this kind of data is still under development

• Reference genome• What to do with novel

genomes? Homologies?• Virtual microarrays!

Curves for the various samples show the number of orthologous groups seen in each per megabase of sequencing. Parentheses indicate the lower

bound of the total number of orthologous groups for each sample.

"Comparative Metagenomics of Microbial Communities," Science 308: 554-557 (2005).

Page 12: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Sequencing technology development

Page 13: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Choice of technology• Cost vs exploratory value

• Most exploratory methods are most expensive and less assessible• Deep sequencing

• Exon arrays

• Exon-junction arrays

• “standard” Microarrays are cheap!• Illumina ref6/8

• Affymetrix U133 plus 2.0

• Study type and platform choice• How much is known about the system already?

• Well defined systems are less likely to discover new genes• New/difficult to handle cell types more likely to contain novel genes

not on standard microarrays

• Can also combine multiple expression techniques

Page 14: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Complex transcriptome sampling strategyAffymetrix exon array: Exon expressionIllumina beadarray: Transcript expression

Illumina CAGE: Transcript start site expression

Highly supported genesLow cost70+ samples~20k transcriptsHumanRef-8_V2

High coverage of the exons of genes and predicted genesModerate cost, 12+ samples1.4 million probe setsHuEx1.0ST

Discovery based TSSHigh cost, 4 samples~27M 27bp tags

Illumina beadarray: Genotyping

HapMap Low cost2+ samples>620,000 markers, median marker spacing 2.7kbHuman610-Quad

hOSC Expanded neurospheres

High cost / few samplesLow cost / many samples

Discovery basedWell defined

Page 15: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Complex transcriptome sampling strategyAffymetrix exon array: Exon expressionIllumina beadarray: Transcript expression

Illumina CAGE: Transcript start site expression

Sample known genes and well defined variants

Sample known and predicted genes and exons

Unbiased TSS discovery and annotation

Illumina beadarray: Genotyping

Sample known genotype variants

Interchangable Gene Model- Ensembl- RefSeq- AceView- Vega

“Complexity is anxiety”

Page 16: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Analysis• What do you want to do to your data.

• Quality Control of array data – Is your data good enough?• Any outliers? Strange values?

• Differential gene expression (DE or DGE) between two or more samples.

• Normalisation

• Gene expression values

• Multiple comparisons

• A list of differentially expressed genes

• Identify common pathways differentially expressed

• Identify common TFs that bind to DE genes

• Identify biomarkers indicative of a particular trait, process.• What does the expression of a biomarker look like?

• Identify gene signatures to allow you to classify a particular sample.• Is there a particular pattern of expression associated with disease?

Page 17: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Quality Control

• Plot your data!• Raw image• Raw values

Page 18: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Quality Control – Short Read Data

• Bioconductor ShortRead package

• Currently Solexa/Illumina

• Will soon include• SOLID

• Short sequence data contains errors

• Currently few standards for this type of data beyond “matches to the genome with <=2 mismatches”

• Data needs to be filtered!

Page 19: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Short Read Data – genome alignment

• Data is useless until mapped to a known genome to identify transcribed elements

• Correct genome build

• Including splice junctions, ribosomal RNA, databases of likely contaminates

• Many methods for fast short read alignment• Vendor specific

• ELAND – closed source

• Open source strategies• Mapq, etc (many new methods being published)• Helicos – open source platform

• Short sequence data contains errors

• Currently few standards for this type of data beyond “matches to the genome with <=2 mismatches”

Page 20: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Normalisation• In order to make comparisons between chips of the same

experiment possible, every chip has to be normalised

• Normalisation is an attempt to eliminate all the non-biological variation in microarray experiments without affecting the biological variations

• There is a danger that some or even most of the biological information will also be removed in normalisation

• It is good to keep in mind that the amount of normalisation should be minimised to avoid this

Page 21: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Median centering• The median intensity of every chip is brought to the same value

• Achieved by calculating the median of the log ratios for one microarray and producing the centered data by subtracting this median from the log ratio of every gene

• Median centering does not change the spread of the data, which also means that the original information content is not altered.

• Only applicable for linear data

• Median centering is done by quantile normalisation.

Page 22: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

• Quantile normalisation• Based on the assumption that the distribution of the intensity

values is similar on every chip.

• Holds well for genes with low expression, but not necessarily well for highly expressed genes.

Page 23: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Distribution of Intensities• Most genes expressed at low levels

• Taking the log of these values brings them closer to a normal distribution

• There is still a skewed distribution!

Page 24: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Distribution of Intensities• Signal and noise give the characteristic shape of the gene

expression intensities.

• Now we can look for gene expression (RMA, LiWong etc)

• … and then differential expression.

abundances + noise = observed values

Page 25: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

• Probes and Transcript assignment• Probes are sequences that hybridise to specific transcripts

• Have varying efficacy and specificity• Often change between chip versions• Do not always target the newly discovered genes

• Transcript models• Change between genome releases• Assignment of probe to transcript is done by the manufacturer

(traditionally quite badly)• Assignment is in the format of CDF files, so this can be changed to

suit

Page 26: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

• Statistical Testing• Select an appropriate statistical test, T-test, ANOVA.

• Select a significance threshold for the p-value

• Form the pair of hypotheses you want to compare

• Calculate the test statistic

• Find out p-value which corresponds to the test statistic

• Draw conclusions

• Assumes normality of distributions so that they can be tested.

• It’s the NULL hypothesis, that there is NO difference in the means of the distributions being compared, which is being tested.

• How to apply this to many genes from a microarray experiment?

Page 27: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

p-values• The p-value is usually associated with a statistical test, and it is

the risk that the null hypothesis is rejected, when it is actually true.

• When a cutoff for the p-value is decided, the values below the threshold are considered statistically significant, and the values above the threshold are considered not statistically significant.

• Often a threshold of 0.05 is used. This means that every 20th time, by chance alone, the difference between groups is statistically significant, when it really isn't.

Page 28: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Multiple testing correction and the False Discovery Rate

• When many analyses are performed on a data set, many results will meet the arbitrary significance level by chance alone.

• Often the p-value is corrected to account for this problem.

• Bonferroni correction is the most simple correction.• The original p-value is multiplied by the number of comparisons to

create a new corrected p-value.

• FDR controls for the expected proportion of false positives• The FDR adjusted p-values are often called q-values

• If all genes with a q-value below a threshold, eg: 0.05, are selected as differentially expressed, then the expected proportion of false discoveries in the selected group is controlled to be less than the threshold value, in this case 5%

Page 29: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

• The distribution for each gene is compared between samples

• A p-value is given to that particular t-test comparison.

• This is probably corrected for FDR.• Genes are ranked by p-value,• And a cut off is usually imposed for

significance .. Say 0.05.

This produces ….• A list of “top 100” genes • Lacks any connections or structure • Where to start/stop?• Unconscious bias to “favourite” genes • How do I control “false discovery” without

also controlling “discovery”?

Page 30: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Prediction of disease specific exon skipping events

GenomeGraphs Bioconductor package (with modifications for gene model selection).

Page 31: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alternative splicing - FIRMA

Probe variation

Probe intensities

Gene expression

Residuals

Alt splicing?

Red – hOSC ???Green – hOSC ???Black – Non hOSC

Red – hOSC PDGreen – hOSC ControlBlack – Non hOSC

Page 32: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alternative splicing – Other methods

• Other (bioconductor) methods for exon analysis• aroma.affymetrix

• exonmap

• Easy as this: Exon analysis in xmap to find a set of differentially expressed genes

• library(exonmap)

• raw.data <- read.exon()

• raw.data@cdfName <- “exon.pmcdf”

• x.rma <- rma(raw.data)

• pc.rma <- pc(x.rma,“group”,c(“a”,”b”))

• keep <- (abs(fc(pc.rma)) > 1) & tt(pc.rma)< 1e-4

• sigs <- featureNames(x.rma)[keep]

• This ease of analysis is RARE (appreciate it!)• Additional exon analysis is available here

• http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2323405

Page 33: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

What next (according to DAVID)?• Identify enriched biological themes, particularly GO

terms• Discover enriched functional-related gene groups• Cluster redundant annotation terms• Visualize genes on BioCarta & KEGG pathway maps• Display related many-genes-to-many-terms on 2-D view.• Search for other functionally related genes not in the list• List interacting proteins• Explore gene names in batch• Link gene-disease associations• Highlight protein functional domains and motifs

Page 34: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Pathway based approaches• A list of “top ranking” pathways• Includes functional connections between genes• Dependent on underlying database and method

• Ingenuity• DAVID• GSEA

• More functionally motivated - but• False pathway detection• Overlapping pathways• Many pathways poorly understood• What about unknown genes?

Page 35: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

Network based / Systems biology approaches• A list of “top ranking” networks• Essentially a looser pathway description where interactions

can be identified from multiple data sources of very different types

• Ingenuity can be used in this way “grow network/pathway functionality”

• Networks/Pathways can be very complex• transcription factors• miRNAs• promoter status• Protein-protein interaction• Multiple levels of data

Page 36: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Differential Gene Expression

• Alternatives – Classifier Approaches

• Multivariate• Looking at combinations of genes that tell us something about the

differences in the samples under study.

• Support Vector Machines• A supervised machine learning way of training a ‘machine’ on one

kind of data (diseased, or a state) and testing on some unseen data.

• Principal Components Analysis (PCA)• Standard procedure of finding combinations with greatest variance• Exploratory analysis, to see natural groupings, and to detect outliers• To identify combinations of features that usefully characterize

samples or genes.• Good for QC of arrays!

Page 37: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

SVMs, higher dimensions and kernels

Page 38: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

F-Josef Müller et al. Nature 000, 1-5 (2008) doi:10.1038/nature07213

Sample collection and analysis for the stem cell matrix.

Page 39: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

F-Josef Müller et al. Nature 000, 1-5 (2008) doi:10.1038/nature07213

Clusters of samples based on machine learning algorithm.

Page 40: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

F-Josef Müller et al. Nature 000, 1-5 (2008) doi:10.1038/nature07213

Clusters of samples based on machine learning algorithm.

Page 41: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Microarray data analysis - multivariate

• GeneRaVE analysis• Use gene expression as a predictor of a response

• continuous • multigroup • ordered categorical• survival

• Integrated variable selectionand model fitting

• Able to handle many variables

• Sparse classifiers – analyse gene expression values to find a small set of genes ‘predictive’ of a response

• Sparse networks – build a network around the response of interest

• https://www.bioinformatics.csiro.au

Page 42: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

KLF4 is necessary for preventing the entry into mitosis following DNA damage

- UCHL1, ubiquitin thiolesterase. Associated with Parkinson’s and Alzheimer’s due to expression in nerve cells.

- FABP6, binds fatty acids, bile acids.

-SLC7A11, cysteine/glutamate transport system xc(-)mediates cystine entry into cell in exchange for intracellular glutamate, which accumulates in response to oxidative stress.

-MMP10, regulated by reactive oxy species, which also modulate tumor progression

Local gene network – smoking

Page 43: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gastro Intestinal expression- OLFM4 expressed in the inflammed colonic epithelium,- SPINK4 gastrointestinal protease inhibitor-REG1A expression is closely related to the carcinoma invasiveness of gastric neoplasms. - CA1 downregulated in GI mucosal neoplasmsMembrate-bound transporters - AQP8 water channel expressed in pancreas and colon. - SLC26A3 epithelial Cl- absorption and HCO3- secretion.

Hormone controlled carbohydrate homeostasis regulators of cellular glycogen release- GCG induces glucose production,- PYY increases after meals, -SST interacts with pituitary growth hormone, thyroid stimulating hormone, and most hormones of the gastrointestinal tract.- INSL5 insulin-like protein, has a newly identified receptor in the colon.

Cellular and developmental patterning- PRAC predictor of colon position- HOXB13 important in embryonic patterning along the axis of the organism,-CLDN8 determines cellular polarity, pathology in colon cancer.

uncharacterised

Local gene network – colon biology

Page 44: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Microarrays, Diagnostics in Cancer

• A gene-expression signature to predict survival in breast cancer across independent data sets

• A Naderi, A E Teschendorff, N L Barbosa-Morais, S E Pinder, A R Green, D G Powe, J F R Robertson, S Aparicio, I O Ellis, J D Brenton and C Caldas

Page 45: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Analysis - DAVID

• DAVID tools for functional analysis of gene lists• http://david.abcc.ncifcrf.gov/

• “The Database for Annotation, Visualization and Integrated Discovery (DAVID) 2008 is the sixth version of our original web-accessible programs. DAVID now provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.”

• We use this one in house often.

• Common questions• What are major enriched GO terms?• What are the highly active pathways?• What are the frequently interacting proteins?• What are the known disease associations?

Page 46: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

GSEA Analysis

• GSEA: Gene Set Enrichment Analysis

• Similar to DAVID• Gene sets are curated differently

• See tutorial• http://www.broad.mit.edu/gsea/do

c/desktop_tutorial.jsp

• A note on gene/probe identifiers • When running the gene set

enrichment analysis, it is critical that all of your data files use the same gene or probe identifiers. You can either use the probe identifiers native to your expression dataset, or collapse each probe set into a gene vector and use HUGO gene symbols as your identifiers

Page 47: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Analysis – common TF's• Commonly regulated genes have common transcription

factors• Use sets of up/down regulated genes in your experiment to find

common binding patterns

• TF databases• TRANSFAC

• Large, commercial, noisy database

• JASPAR• a curated, non-redundant set of 123 profiles, derived from

published collections of experimentally defined transcription factor binding sites for multicellular eukaryotes.

• open data acess, non-redundancy and quality• When use JASPAR? When seeking models for specific factors or

structural classes, or if experimental evidence is paramount • http://jaspar.genereg.net/

Page 48: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

What next??

• Validation of results?• Good for biomarkers, RT-PCR.

• Literature.

• Other results in the lab.

• Any independent datasets? – check GEO/ArrayExpress.

• Biological significance of results• Do your results suggest further experiments?

Page 49: Alistair Chalk, 2008 Gene Expression Goals To understand current high throughput strategies for measuring gene expression To understand quality control

Alistair Chalk, 2008

Gene Expression

• Goals revisited• To understand current high throughput strategies for measuring

gene expression

• To understand quality control and normalisation in gene expression data

• To understand the factors behind choice of gene expression measurement strategies

• To identify downstream analysis methods for gene expression data