a brief introduction to transcriptomics: from sampling to ... · a brief introduction to...
TRANSCRIPT
A brief introduction to transcriptomics: from
sampling to data analysis
Leeds-omicsintroduc/onseries
Outline
1. Introduc/ontotranscriptomes2. Samplecollec/on3. RNAextrac/onmethodsandRNAquality
assessmentandquan/fica/on4. RNAsequencingtechniques5. Bioinforma/cAnalyses-Typicalpipeline:Quality
assessment,trimming,6. Specialtypeofanalyses:mappingontogenome,
quan/fica/onofexpression,variantcalling(SNPs)
Transcriptomes give us information of gene expression
3
Iden/fygenesdifferen/allyexpressed,iden/fyfunc/onalchanges…
Why use transcriptomes in biological research?
Pros
• Easy,accessiblewaytoseeandquan/fygeneexpression
• Immediateaccesstotheproteincodingpor/onofthegenome
• Iden/fyalterna/vesplicing• Iden/fySingleNucleo/dePolymorphisms(SNPs)incodingregions
Cons• Snapshotin/me(different/mes,differentexpressionpaTerns)
• Absenceofagenedoesnotmeanitisnotpresentinthegenome.
• Difficulttoensurethatyouhavesampledasinglecelltype.
• Sta/s/calanalysisishighlydependentonexperimentaldesign.
The stage of gene expression we capture
5
RNAseqcapturesthematuremessengerRNA(mRNA)
Targetsthecharacteris/cpoly-AtailofthemRNA
Theassump/onisthattheamountofmRNAforanygeneisreflec/veofitsimpactonthecellfunc/on
Sampling design
VERYIMPORTANT:whatisyourresearchques/on?--willyouhaveenoughtoaddressyourques/on?Thingstobearinmind:• What/ssuestotarget–relevanttoyourresearchques/on• Homogeneoussamplingof/ssues-totheextentyoucanmanage
• Replicates–accountsforvaria/onandimportanttovalidateresults
• Developmentalstageofstudiedindividuals
• Consultsequencingspecialists–(IanCarrandSteveMoss)foradviceonsampling
Some techniques commonly used to stabilise RNA
• Snapfreezing(liquidnitrogen)–immediatestoragein-80°C.• RNAlater(Ambion)–smallsized/ssue(<0.5cmlengths)putinx5volumesofit.Longtermstorage:-20°Cor-80°C.
• NAPbuffer(”homemade”)–similartoRNAlater.• Othercommercialproductscustomisedtosampletypes(i.e.blood)
Snapfreezing(liquidnitrogen)
Preserving/ssuewithRNAlater
ThermoFisherScien/fic
Considerations when preserving samples
• mRNAisfragileandunstable-suscep/bletodegrada/on–actfast.
• Ensureasep/ccondi/ons–usetubesandtoolsthatareRNAse-free.
• Amountof/ssuethatyouneed–some/ssueshavehighyields(e.g.liver),andotherstendtogivelowyield(e.g.adipose/ssue,brain).
• Storage–ideallyat-80°C
Comparison between preserving methods and samples
Camacho-Sanchezetal.2013.MolecularEcologyResources13,663–673
Snapfrozen:bestresults
FollowedbyRNAlaterandNAPbuffer
Obtaining the mRNA
Bind total RNA
Tissue
Lyse and homogenis
e
Add gDNA eliminator
and chloroform
Separate phases
Add ethanol to aqueous
phase
Wash
Elute
Total RNA
IMPORTANT CONSIDERATIONS: Extraction of RNA is complicated by the presence of ribonucleases in tissues • RNases are difficult to inactivate
ORGANIC EXTRACTION PROTOCOL
Other RNA extraction methods Extrac:onmethod Benefits Drawbacks
Filter-based,SpinBasketFormats Convenientandeasy Canbecomecloggedwithpar/culates
Amenabletosingle-sampleand96-wellprocessing
gDNAandotherlargenucleicacidsareokenretained
Canbeautomated Automa/onrequirescomplexvacuumsystems/centrifuga/on
Magne/cPar/cleMethods Canbeautomated Magne/cpar/clescanbecarriedthrough
Rapidsamplecollec/on/concentra/on Lessefficientinviscoussolu/ons
Noriskoffilterclogging Laboriouswhenperformedmanually
DirectLysisMethods Workswellwithsmallsamples Dilu/on-based
Canbeautomated Spectrophotometricmeasurementofyieldisnotpossible
Scalable PossibleforRNAseresidualac/vity
Poten/alformostaccurateRNArepresenta/on
Performancecanbesubop/mal
RNA quality assessment and quantification Itisimportanttoestablishboththepurityandconcentra/onofRNAthathasbeenextracted
UVSpectroscopy• MeasuresabsorbanceofdilutedRNAsampleat260and280nm• Nucleicacidconcentra/oniscalculatedusingBeer-Lambertlaw
Absorbanceatapar/cularwavelength
Concentra/onofnucleicacid
Pathlengthofthespectrophotometer
cuveTe(typically1cm)Ex/nc/on
coefficient
εRNA=0.025(mg/ml)-1cm-1
A=εCI
RNA quality assessment and quantification Itisimportanttoestablishboththepurityandconcentra/onofRNAthathasbeenextracted
UVSpectroscopy• MeasuresabsorbanceofdilutedRNAsampleat260and280nm• Nucleicacidconcentra/oniscalculatedusingBeer-Lambertlaw
A=εCIe.g.A260=1.0isequivalentto~40μg/mLRNAA260/A280ra/oindicatesRNApurity• 1.8-2.1indicateshighlypurifiedRNA
IMPORTANTCONSIDERATIONS:• pH• CuveTe• RNAdilu/onrange• DoesnotdiscriminatebetweenDNAandRNA(useRNase-freeDNasetoremovecontamina/ngDNA
RNA quality assessment and quantification Itisimportanttoestablishboththepurityandconcentra/onofRNAthathasbeenextracted
Agilent®2100Bioanalyzer• Combina/onofmicrofluidics,capillaryelectrophoresisandfluorescentdye• EvaluatesbothRNAconcentra/onandintegrityBioanalyzerlabchip
• Nano(ng/μL)andpico(50-5000pg/μL)systemsavailable• DeterminessizeandmassdeterminedasRNAmoleculesfluoresceinchipchannels• Systemproducesagel-likeimageandanelectropherogram• Comparesunknownconcentra/onstoAgilent®RNA6000Ladder• RNAIntegritynumberdeterminedbyanalysisalgorithm(maxvalue10)
RIN~10
RIN~6
RNA Sequencing
• Wholetranscriptomeshotgunsequencing(WTSS)• Revealsthepresenceandquan/tyofRNAinabiologicalsampleatagivenmomentin/me
RNAISOLATION
RNASELECTION/DEPLETION:
ISOLATEDRNA
SELECTIONVIAPOLY(T)MAGNETICBEADS
POLY(A)RNAMOLECULESBINDTOPOLY(T)BEADS
• PolyAselec/on• rRNAdeple/on• RNAcapture
cDNASYNTHESIS
RNA sequencing IMPORTANTCONSIDERATIONS:• COST• SINGLEVSPAIRED-ENDREADS
• SE:FOREXPRESSIONANALYSISOFWELLANNOTATEDGENOMES• PE:BETTERFORCHARACTERISATIONOFPOOLYANNOTATEDTRANSCRIPTOMES
• READLENGTH• DEPTHOFCOVERAGE
• Determinedbynumberofsamples(libraries)inonelane• REPLICATES,RANDOMISATIONANDMULTIPLEXING
RAWREADS DATAANALYSIS
Sampletype ReadsneededforDifferen:alExpression
(millions)
ReadsNeededforRareTranscriptorDeNovoAssembly(millions)
ReadLength
Smallgenomes(bacteria/fungi)
5 30-65 50SEorPEforposi/onalinfo
Intermediategenomes(Drosophila,C.elegans)
10 70-130 50-100SEorPEforposi/onalinfo
Largegenomes(human/mouse)
15-25 100-200 >100SEorPEforposi/onalinfo
E.g.(Luietal.,2014)12samplesinonelaneofIlluminaHiSeq=10millionreadspersample4samplesinonelaneofIlluminaHiSeq=30millionreadspersample3Xmorereadspersample =1.5Xcostincrease
=~25%moredifferen/allyexpressedgenesdetectedLiu,Y.,Zhou,J.,andWhite,KP.,(2014)RNA-seqdifferen/alexpressionstudies:moresequenceormorereplica/on?Bioinforma/csFeb1;30(3):301-4
Bioinformatics - Analysis of transcriptomic data
Pasteurella in Saiga Antelope host
MassmortalityhitSaigaAntelopeinSpring2015.àPasteurellainfec:on?4samplesofdifferent/ssues- 3antelopesdiedfrominfec/on- 1antelopediedfromothercause
2objec:ves:1) GetexpressionlevelofvirulentPasteurella
genes(coun/ngreads)2) Iden/fyotherpossiblemuta/ons(variant
calling)
Transcriptomic pipeline
Transcriptomic pipeline
NGS data – what it looks like
ExamplesizeforsampleofSaigatranscriptome:12Gb
(.fastq,.sff,.fa,.csfasta/.qual)
Transcriptomic pipeline
Sequencing quality check
Fastqqualityscore:Q=-10log10PQualityscore Probabilityofincorrect
iden:fica:onAccuracyofbaseiden:fica:on
40 1in10000 99.99%
30 1in1000 99.9%
20 1in100 99%
10 1in10 90%
FastQCinterface
FastQC:visualisa/onTrimmoma/c:trimreadsCutadapt:removeadaptors
Sequencing quality check
Fastqqualityscore:Q=-10log10PQualityscore Probabilityofincorrect
iden:fica:onAccuracyofbaseiden:fica:on
40 1in10000 99.99%
30 1in1000 99.9%
20 1in100 99%
10 1in10 90%
FastQCinterface
FastQC:visualisa/onTrimmoma/c:trimreadsCutadapt:removeadaptors
Sequencing quality check
Fastqqualityscore:Q=-10log10PQualityscore Probabilityofincorrect
iden:fica:onAccuracyofbaseiden:fica:on
40 1in10000 99.99%
30 1in1000 99.9%
20 1in100 99%
10 1in10 90%
FastQCinterface
FastQC:visualisa/onTrimmoma/c:trimreadsCutadapt:removeadaptors
Transcriptomic pipeline
Mapping reads to Pasteurella genome
Ø ExtractPasteurellareadsfromsamplesØ Caseswherethereisnoreferencegenome
ReferencePasteurella(FASTAfile–NCBI)
SamplePasteurellaSaigaantelope
Mapping to reference genome
Outputfile:BAM(BinaryAlignmentMap)compressedandencrypted.SAM(SequenceAlignmentMap)
Commonso\wareexamples:ForDNA:- BWA(BurrowWheelerAligner)- Bow/eForRNA:- Tophat- STAR
Transcriptomic pipeline
PicardToolsSamtools
Transcriptomic pipeline
SAM format and alignment statistics
Sta:s:cs:Samtools‘flagstat’
SAMformat
Transcriptomic pipeline
Mpileup file
Samtools‘mpileup’
SAMfile
Mpileupfile
Transcriptomic pipeline
Count reads mapping a region
Commonso\ware:htseq-countComparegeneexpressions.àDifferen/alexpression
Sample1 Sample2
Total:1350readsmappinggeneA
Total:10readsmappinggeneA
Transcriptomic pipeline
Compare reference to ‘sample’ Pasteurella
Commonso\ware:Varscan(java)
Variantcalling:
- SNP(singlepolymorphismnucleo/des)
- Indels
IGV
FastQCTrimmoma/c/Cutadapt
BWA,Bow5e,STAR,Tophat
SamtoolsØ FlagstatØ Mpileup
HTSeq-countVarscan
Summary
Need help?
Adviceonappropriatepipeline:Ø IanCarr:[email protected]
Ø StephenMoss:[email protected]
Unixcommand,script,so\wareparameters:
Ø NatachaChenevoy:[email protected]
Coming2dayworkshopinthenewyear:“Introduc:ontostandardtranscriptomeanalysis”
SteveMoss
Acknowledgements
M. O’Connell
MembersoftheO’ConnellLab
MembersoftheCreeveyLabatAberystywthUniversity
Sequencingadvice:IanCarrSteveMoss Simon Goodman