transcriptomics - university of birmingham · 2019-02-12 · transcriptomics. general approaches of...
TRANSCRIPT
TranscriptomicsGeneral approaches of microarrays and pyrosequencing. What do they deliver?
EU research network on flame retardants (INFLAME)Tim Williams
1. Introduction: Trancriptomics – What? Why?2. DNA Microarrays3. High Throughput Sequencing4. What does it Deliver?5. Experimental Design
11.20 – Metabolomics – Mark Viant11.40 – Proteomics – Caroline Vanparys
13.00 – Omics, biomarkers & risk assess. – Kevin Chipman13.30 – Bioinformatics & predictive tox. – Francesco Falciani
Genome Transcriptome Proteome
DNA mRNA ProteinsTranscription Translation
Differential SplicingRNA stability etc
Post TranslationalModification etc
Metabolites
Metabolome
Genomics Transcriptomics Proteomics Metabolomics
3 billion bases(H sapiens)
EnzymeActivity
20‐30,000genes
~100,000proteins
Functional Genomics
Epigenetics
Tf
RNA
mRNA
cell
nucleus
ribosomeEnzyme activity
ToxicChemical
Damage
Receptor Proteins
SignalTransduction
Activate Transcription Factor Proteins
DNA
ResponseElement
Tf gene 1Transcription
Complex Pol IITranscription
Specific transcription factors bind to specific response elements, at specific genes.
These bind the transcription complex, which then increases RNA transcription from that gene.
Translation
Protein
Compound A
Compound B
Transcription(heavily simplified)
The Transcriptome:The collection of ALL mRNA transcripts present within a biological sample at any one time
Biological sample:
Single cell type Single organA few cell types
Whole organismMany tissue and
cell types
Each eukaryotic cell contains ~360,000 mRNA transcripts in total at any one time
Most eukaryotic cells transcribe >10,000 different mRNA transcripts
Gene expression varies over at many orders of magnitude.
Highly expressed genes dominate mRNA, generally >75% of the mRNA comes from <5% of the genes.
To study the transcriptome we therefore need a sensitive technique with a wide dynamic range to get the most information on the greatest number of transcripts
TranscriptomicsFinding the amount of mRNA that has been produced from each gene simultaneously
DNA Microarrays
•Make a slide with ‘probes’ specifically binding each gene•Hybridise your mRNA to it•Quantify how much bound to each probe
High Throughput Sequencing
•Isolate mRNA, sequence the whole lot•Find how many sequences you got from each individual gene
DNA Microarrays‐ Specificity and Sensitivity
ProbesDesign ‐ Start with sequences of all genes from your species, find unique 20‐ to 60‐nucelotide long sequences for each gene (all in silico), synthesise oligonucleotides(single strand DNA eg 5’‐ATCGGTGCATGCATGTAGAGTAGGGGTTTCATTCAGTAACT‐3’)that specifically bind the mRNA (done on‐chip by company – Agilent, Affymetrix etc)
Slides or Chips Oligos can be printed on and bound to slides in precisely located ‘spots’ that are small (microns), so that 100,000s+ can be printed in the area of a microscope slide
HybridisationUnder stringent hybridisation conditions each oligonucleotide will specifically bind only the complementary sequence from the mixture of mRNAs in your sample. Can co‐hybridise with control labelled with a different fluorophore for eg internal control.
DetectionFluorescent labelling of the mRNA sample allows very sensitive detection. Fluorophores are excited by a laser and light emitted at a specific wavelength passes through filters and is detected by photomultiplier tubes
Microarray SchemeDesign specific
probes for each mRNA transcript
Print Array Slides cRNA labelled withFluorescent dyeeg Cy‐3 dCTP
Hybridize to arrayRead with Scanner
Fluorescence of spot is proportional to amount of RNA for that specific gene
TestControl
RNA
Cells or Tissue
Analyse spot intensities
Compare between test and control samples
Microarrays areSpecies‐specific
But are commercially available for model
species eg human, rat, mouse, yeast, E. coli etc
9
Stickleback 8x15k Agilent Array
One 15,000 spot subarray Zoomed in on spots (false colour)
High Throughput Sequencing – RNA SeqNext‐Generation Sequencing
Roche 454: 400 Mb per run, 400bp seqs
Ion Torrent: 200 Mb to 1Gb per run, 200 bp seqs
ABI SOLiD: 5‐20 Gb per run, 50bp seqs
Illumina/Solexa: >100 Gb per run, 35‐100 bp seqs
Throughput improving rapidly
Highly expressed genes dominate mRNA, generally >75% of the mRNA comes from <5% of the genes. Therefore more and more sequencing is required to find ‘rare’ transcripts. This is referred to as ‘sequencing depth’.
Number of sequences
Sample 1Sample 2
Sample 3
Example from: Oliver et al., BMC Genomics 2009, 10:641
Gene Sequence
Samples 1 and 2 were wild‐type Listeria, sample 3 was a sigma 70 transcription factor knockout
High‐throughput sequencing identified genes controlled by sigma 70 transcription factor
BREAKING NEWS:
January 10, 2012 5:06 amMachine to read individual’s DNA for $1,000By Clive CooksonA US biotechnology company will on Tuesday announce the first machine that can read all 3bn letters of an individual’s DNA for as little as $1,000 – a development that will greatly accelerate medical treatment tailored to a patient’s genes but also raises ethical questions.
Life Technologies says its new Ion Proton sequencer – a $149,000 instrument about the size of a laser printer – can read a whole human genome in less than a day for $1,000 including all chemicals, running costs and preliminary data analysis.
What do you get from Transcriptomics?
•A simple experiment (note that biological replication is essential)
Control Treated
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Omicsassay
Gene 1 Mean SDGene 2 Mean SDGene 3 Mean SD..Gene 20000 Mean SD
Gene 1 Mean SDGene 2 Mean SDGene 3 Mean SD..Gene 20000 Mean SD
Stats TestsFDR CorrectionFold Change
Generate a list of transcripts that are significantly differentially expressed in test v control,Showing direction of the change and apparent magnitude of change eg.
CYP1A1 (Cytochrome P4501A1) 11‐fold induced with FDR 0.001MT1 (Metallothionein I) 3‐fold repressed with FDR 0.05
Etc..
Omics Techniques
Lots of Data!
What do you do now?
Biological Experiment
Scatter Plot ‐ LogarithmicExample scatter plotof Cadmium‐treatedflounder day 1
X-axis: Cd stage2 (Default Interpretation) : treatment b Cd ...Y-axis: Cd stage2 (Default Interpretation) : treatment b Cd ...
Colored by: Cd stage2, Default Interpretation (treatment b ...Gene List: Good Cd (8117)
100 1000 10000
10
100
1000
10000
treatment b Cd d1 (control)
Induced genes
Repressed genes
1:1 ratio2‐fold up
2‐fold down
Log2 of Controls
Log 2
of Test
HSP30 Clones
Each dot represents one array probe
Enrichment AnalysisApproaches for inferring functional change from gene annotation, eg Gene Ontology
Interpretation
Most genes induced encode ribosomal proteins
Evidently something going on at the ribosome
Ribosomes translate mRNA into protein
Increased protein synthesis is required when cells proliferate
Cancer cells are proliferating
What we have just done is ‘Enrichment Analysis’
Genes induced in tumour Computational
Use eg. Fisher’s Exact Test with a multiple testing correction
Some Options‐
DAVID – good for model specieshttp://david.abcc.ncifcrf.gov/
Blast2GO ‐ good for de novoannotation of non‐model species http://www.blast2go.com/b2ghome
EASE via TMeV– very flexible, analyse any type of annotationhttp://www.tm4.org/mev/
Ingenuity Pathway Analysis (IPA) –Commercial package (limited free trial) integrates enrichment analyses and interaction data etchttp://www.ingenuity.com/
Finding significantly enriched functional groups of genes helps organise, visualise and understand the data.
Example – Flounder fish response to single intraperitoneal injection with cadmium over a timecourse
Highlights functional groups and pathways responding to treatment
Brominated Flame Retardants• Pentamix – commercial penta‐brominated diphenylether mixture
• 70 ug/kg to 70 mg/kg pentamix‐dosed sediment and food
• Adult male flounders
• Livers sampled at 3 months
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
0 0.07 0.7 7 70
Mean fold cha
ne versus con
trols
Pentamix [mg/kg] sediment
Significantly induced at low dose
Correlate >0.9 with Pentamix concentration
VitellogenesisCell CycleCell Death
Oxidative phosphorylationProtein ubiquitinationPentose phosphate pathwayAmino acid metabolism
High UK PBDE environmentalsediment concentration~1mg/kg (late ‘90s)
• Changes in transcription of metabolic enzymes at and below environmental concentrations
•Induction of energy pathways•Increase in protein turnover
• Endocrine disruption at highest concentration•Vitellogenesis (Vtg A and Vtg B) in male fish•Disruption of cell cycle•Induction of cell‐death related transcripts
0.01
0.1
1
10
100
Control 0.07 0.7 7 70
Vitellogeninexpression
Thinking about the experiment
• 1‐What is your biological question?
• 2‐ Has it been done before? In which species?– Search the databases eg. PubMed, CTD, GEO, ArrayExpress
– Consider using real‐time PCRs, PCR‐arrays etc to compare with the response in your species
• 3‐ Use data from previous studies to plan yours– No point exceeding the lethal dose. Consider the concentrations and timepoints carefully
• 4‐ Physiological Anchoring & Multi‐omics– Data from other assays integrated with omics allow more in‐depth analyses and provide confidence
– Multi‐omics ‐ gives greater insight into the interplay between regulation and mechanism
“Maximum information from minimum effort”
Cost, Time, Lab capacity Information‐ Quantity & Quality
Variation and ‘Omics• Systematic variation
– Must be avoided (Experimental design)
• Technical variation– Must be minimised (Quality control)
• Biological variation– Is INEVITABLE
• Evolution requires it!
• The ‘Real World’ rarely involves inbred animal strains or clonal cell cultures
• Biologists must use statistics intelligently
• Replicates, replicates and more replicates
• If possible do a Power Study on preliminary data before finalising experimental design
Pritchard et al 2001
•Normal variability in gene expression in the mouse: Up‐to 68‐fold change!
Insufficient labelling
When Arrays Go Bad
TECHNICAL VARIATION
As with any experiment, optimal experimental design is essential. Transcriptomics tends to reveal problems with experimental design that are less obvious with other techniques.
For example consider gene expression in 2 groups of mice ‐
Group1 ‐ ‘control’ group Group 2 – ‘test’ groupUntreated Treated with BFR in solvent carrierFed at 10am Fed at 10amSampled at 4pm Sampled at 11amRNA prepared on 26th by Amy RNA prepared on 31st by BobMicroarrays run on slide batch ‘C’ Microarrays on batch ‘D’ slides
These are likely to show SYSTEMATIC VARIATION between groups
Are the differences in gene expression between the 2 groups caused by BFR or other factors? – Solvent? Circadian? Nutrition? Operator differences? Slide Batch?
RANDOMIZATION can help avoid systematic variationRemember the basics – only change one factor while the rest are kept the same!!
Experimental Design
Bovine Micro-array
Photo courtesy of Brendan Wren
Acknowledgements
Birmingham‐ Kevin Chipman, Francesco Falciani, Mark Viant, Leda Mirbahai, Nil Turan, Olga Hrydziuszko, Huifeng Wu, Anthony Jones, LaineWallace
Cefas Weymouth – Brett Lyons, Grant Stentiford, Ioanna Katsiadaki, John Bignell
Stirling – Steve George, Mike Leaver, Amer Diab, John Taggart, Carolynn Mackenzie, Katie Bartie, Vicky Sabine
AWI Bremerhaven – Angela Kohler, Katya Broeg
Glasgow Caledonian – John Craft, Kate Dempsey
Holland – Ron van der Oost, Erwin Roex, Edwin Foekma, Tinka Murk
Far East – Beijing Genomics Institute , University of Singapore
Funding