ngs bioinformatics workshop 1.5 tutorial – genome annotation april 5th, 2012 irmacs 10900...
TRANSCRIPT
NGS Bioinformatics Workshop1.5 Tutorial – Genome Annotation
April 5th, 2012IRMACS 10900
Facilitator: Richard BruskiewichAdjunct Professor, MBB
Workflow for Today
Prepare to visualize annotationGet a genomic sequence from GenbankRepeat mask it.
Retrieve a genomic sequence…
Retrieve a (relatively small <100kb, eukaryote) genomic sequence clone from GenbankQuery Nucleotide divisione.g. Arabidopsis BAC
clone (HE601748.1)Select FASTASave.. To File.. As “Fasta” (rename?)
Blast is a low hanging fruit…
Use BLAST to quickly survey for similar sequencesMegablast against nucleotide
e.g. HE601748 is closest to A. thaliana chr. 5?Megablast against reference RNA sequence db
Repeat Masking
Upload the clone file to RepeatMasker on the web and run with appropriate parameters:http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker
Save the results (including the masked sequence) to your computer
ab initio Gene Predictions
Genscan:http://genes.mit.edu/GENSCAN.html
Cut and paste results as text to a fileFgenesh:
www.softberry.com
Blast2GOhttp://www.blast2go.com
Annotation workbench, via Gene Ontology (GO) terms. First, save the predicted peptides (e.g. from fgenesh)
need to fix the FASTA headers to assign proper identifiers (could write a script?)
(Java web) start blast2go workbench Load in peptides Do the analysis… e.g. run blastp, GO, annotation,
Interpro, etc. See www.geneontology.org for details on GO http://www.ebi.ac.uk/interpro/ for interpro info
EMBOSS
European Molecular Biology Open Software Suite (EMBOSS):
http://emboss.sourceforge.net Download and install version of interest (e.g.
Linux, Mac OSX, Windows…)Decide what do to:
http://emboss.sourceforge.net/apps/groups.htmlLet’s try a CpG island plot (cpgplot)
Study Genes by Comparative Genomics
JGI Vista toolkit:http://genome.lbl.gov/vista GenomeVistarVista