diya: an annotation pipeline for any genomics lab
DESCRIPTION
TRANSCRIPT
![Page 1: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/1.jpg)
Do It Yourself Annotator
An annotation pipeline for every genomics lab
![Page 2: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/2.jpg)
Andrew Stewart *, Timothy Read
●Genomics Department, Biological Defense Research
Directorate, Navy Medical Research Center,
Rockville, Maryland, United States
●Distribution and source code are available at
https://sourceforge.net/projects/diyg/
●Contact: [email protected]
![Page 3: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/3.jpg)
●DIYA is an open source pipeline for the rapid annotation of genomic sequences.
The software is designed to use as input DNA contigs, either in the form of
complete genomes or the result of shotgun sequencing of a genome library, and
produce as output a fully annotated sequence.
●The DIYA pipeline is modular in nature, and easily expandable to include further
forms of feature finding. Each module follows a similar structure, using for input
and output a standard format as a conduit between stages in the pipeline. The
usefulness of BioPerl (http://bioperl.org) as a format conversion utility and parser
is demonstrated in this system. SGE support allows running multiple sequences
in parallel.
![Page 4: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/4.jpg)
Background
●“A sequencing center in every genomics lab”
●Thus, an annotation pipeline in every genomics lab
●Need for sequence analysis tools with
decentralization of sequencing technology
![Page 5: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/5.jpg)
Background
●Explosion of tools onto the bioinformatics community
●Inconsistent formats, need for ‘pipelining’, bioperl
![Page 6: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/6.jpg)
Background: BDRD
●454 Life Systems FLX sequencers
●Push data off onto servers
oAssembly
oAnnotation
oAnalysis
![Page 7: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/7.jpg)
Outline of the pipeline
●diya-assemble-pseudocontig
●diya-glimmer
●diya-blast
●diya-rfam_scan
●diya-tRNAscan
●Auxiliary scripts
![Page 8: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/8.jpg)
Installation requirements
●Software
oPerl v5+, SGE, MUMer, Glimmer, Blast, tRNAscanSE, Infernal, rfamscan.pl
●Databases
oProtein Clusters, Rfam
●Perl libraries
oBioPerl, Getopt::Long, Data::Dumper, XML::Simple, etc..
![Page 9: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/9.jpg)
Pipeline: diya.pl
●Controller script for the pipeline
●Manages configuration and project data table
generation
●Fires off jobs to SGE
![Page 10: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/10.jpg)
Pipeline: Assembly
●Generate a ‘pseudocontig’
●MUMmer v3.20 (http://mummer.sourceforge.net/)
![Page 11: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/11.jpg)
Pipeline: Glimmer
●Prediction of gene coding regions
●Glimmer v3.02 (http://www.cbcb.umd.edu/software/glimmer/)
og3-iterated.csh - two rounds of iteration
●Uses interpolated Markov models to distinguish
between coding and non-coding regions
![Page 12: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/12.jpg)
Pipeline: Blast
●BLAST v2.2.16 (ftp://ftp.ncbi.nih.gov/blast/)
●Two rounds of blast against..
oReference genome
oProtein Clusters database
![Page 13: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/13.jpg)
Pipeline: rfam_scan
●Identification of ncRNA (rRNA, tRNA)
●Infernal v0.81 (http://infernal.janelia.org/)
●Rfam (http://www.sanger.ac.uk/Software/Rfam/)
●rfamscan.pl v0.1 (http://www.sanger.ac.uk/Users/sgj/code/)
![Page 14: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/14.jpg)
Pipeline: tRNAscan-SE
●Identification of tRNA
●tRNAscan-SE v1.23 (http://lowelab.ucsc.edu/tRNAscan-SE/)
![Page 15: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/15.jpg)
Pipeline: Auxiliary scripts
●Locus tag reordering (cleanup)
●Protein extraction (ie, PIPA input)
●Pseudocontig disassembly
●Hooks
oLoad databases
oReport genome statistics
oWikiLIMS integration
![Page 16: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/16.jpg)
Modularity
●Adding extra modules is rather simple
●Things to come...
oCRISPR elements
opseudogenes
oprophages
![Page 17: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/17.jpg)
Do It Yourself Genomics
●A project community and collection of bioinformatics
tools and applications for the analysis of genomic
sequence data, with the intent of bringing these tools
into the hands of medium to small scale sequencing
labs.
![Page 18: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/18.jpg)
DIYG on disk
●OS (Linux) distribution with DIYG pre-installed
●Simplifies process of installation, compilation,
‘prerequisite gathering’
●Run analysis directly on sequencer workstation?
●Easy deployment across a high performance
computing cluster
![Page 19: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/19.jpg)
DIYG: Virtual Machine
●Virtualization creates a complete, self-contained
deployment of an operating system
●“Disposable” analysis machine
![Page 20: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/20.jpg)
DIYG: Cloud Computing
●Ideal for labs without direct access to a HPC cluster
●Truly an annotation pipeline in every genomics lab
![Page 21: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/21.jpg)
Deployment at BHSAI
●Make sequence annotation available to wider DOD
community
●Concerns about ‘perl’ nature of DIYA
●Need to determine HPC guidelines
●Possible integration / hook into PIPA
![Page 22: DIYA: An annotation pipeline for any genomics lab](https://reader035.vdocuments.net/reader035/viewer/2022062511/54c6cc2b4a79593f718b457b/html5/thumbnails/22.jpg)
Deployment at BHSAI
●Conventional installation (integration into existing
systems, ala PIPA)
●Sourced from disk image
●Virtualization servers? (if available)