advanced practical course in genome...
Post on 04-Jun-2020
7 Views
Preview:
TRANSCRIPT
Practical course in genome bioinformatics
Day 6 – lectureManual annotation (Web Apollo & automatic tools)
24 Feb 2017http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Genome project “roadmap”
• After experimental design and preparations a genome project can be roughly split into the following steps:
1. Sequencing2. (de novo) assembly, scaffolding3. RNA-sequencing and mapping4. Gene prediction5. Manual & functional annotation6. Submission and publication of the genome in a biodatabase7. Further downstream analysis
26.2.2016 juhana.kammonen@helsinki.fi
You are here !
http://padlet.com/juhana_kammonen/bioinfo
Manual annotation - outline
• Repetitive elements prediction (ab initio) finds and masks repeats in the genome
• Automated gene prediction reveals potential gene content from the genome• A challenging task especially in eukaryotic genomes
• After gene prediction the genome must be manually curated• Apply a set of trained human eyes to evaluate different tracks
of evidence and find potential errors in gene predictions
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
The ”rocky path” from de novoassembly to manual annotation
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Evidence tracks not covered well by gene prediction
• RNA-sequencing coverage• Relatively weak correlation with
actual gene location but admittedly is an indication of expression level
• Should be included as an evidence track in Web Apollo
• Splicing in eukaryotes• Exons may be incorrectly linked /
separated in the gene models
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Gene prediction and splicing
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B, Holt C, Sanchez Alvarado A, Yandell M (2008). MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research, 18(1), 188–196.
Gene prediction accuracy revisited
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Web Apollo manual annotation tool
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM, Stein L, Holmes IH, Elsik CG, Lewis SE (2013). Web Apollo: a web-based genomic annotation editing platform. Genome Biology, 14(8), R93.
Web Apollo annotation view
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Basic recipe for manually annotating a single gene• Use as many relevant tracks of evidence as possible to
verify a predicted gene
• Align the predicted sequence against various databases
• BLAST
• Possible databases of related species
• Add comments on the annotation of your findings
• If the predicted annotation was modified, specify the reason
• Web Apollo allows the annotations to be marked e.g. as ”needs revision”
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Today’s features
• Web Apollo – collaborative online manual annotation tool• Lee E, Helt GA, Reese JT, Munoz-Torres MC, Childers CP, Buels RM, Stein L,
Holmes IH, Elsik CG, Lewis SE (2013). Web Apollo: a web-based genomic annotation editing platform. Genome Biology, 14(8), R93.
• BLAST – Basic Local Alignment Search Tool• ”Swiss army knife” of a bioinformatician
• Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3):403-10
• MAFFT – Multiple Alignment Using Fast Fourier Transform• Katoh K, Misawa K, Kuma K, & Miyata T (2002). MAFFT: a novel method
for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research, 30(14), 3059–3066.
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Manual annotation efforts at Viikki campus• Betula pendula (silver birch) genome annotation in
spring of 2014• 1000 genes curated and annotated during 3 weeks
• Taphrina betulina genome annotation in 2015• 800 genes curated and annotated during 2 weeks
• P. Hispida saimensis genome annotation coming up later this year
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
Next: Computer exercises
• Getting familiar with Web Apollo http://apollo.berkeleybop.org
• Example annotation of a gene in Apis mellifera (honeybee) genome with Web Apollo
• Confirmation of proper annotation using BLAST and MAFFT
• Download exercise sheet from: http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/Exercises_day6.pdf
26.2.2016 juhana.kammonen@helsinki.fi http://padlet.com/juhana_kammonen/bioinfo
top related