metapathways installation v0 - hallam labhallam.microbiology.ubc.ca/metapathways/resources/... ·...

9
MetaPathways v0.8 Installation 1. Downloading MetaPathways. Download the zip le MetaPathways_v0.8.zip from http://hallam.microbiology.ubc.ca/MetaPathways/ appropriate for your operating system. This should work on any unix-based 64-bit operating system. Aer you have downloaded the le, unzip and inspect the contents of the MetaPathways/ folder (Figure 1). Figure 1 — An example of the MetaPathways/ folder from the MetaPathways_v0.8.zip le. Notice that the folder has a number of dierent les and folders inside it. The template conguration (template_config.txt) and parameter conguration (template_param.txt) les are used to congure and set parameter settings of each of the analytical steps of the pipeline. Additionally, the Python script, MetaPathways.py, is used to start the pipeline. A Tour of the MetaPathways/ folder: blastDB/ - place where BLAST databases are stored along with name-mapping and taxonomic support les for specic databases like KEGG and COG daemon.py - a script that carries out external operations on supercomputing grids using the Sun Grid engine, if available executables/ — contains various analytical and data handling programs that process the inputs and outputs of dierent steps of the pipeline e.g. BLAST, Prodigal, trna-scan, etc. libs/ — the code library folder contains dierent Perl and Python functions and code that coordinate dierent steps of the pipeline MetaPathways.py — the starter script/program that runs the pipeline with specic conguration and parameter settings for each of the steps MetaPathwaysrc — a unix source le that ensures that the computer system knows where the MetaPathways/ folder, sets the local python and perl paths, and compiles some executable code template_config.txt — a conguration le that species the location of dierent programming resources on the computer. e.g. the Location of BLAST databases, Perl, Python, etc.

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

MetaPathways v0.8 Installation

1. Downloading MetaPathways. Download the zip file MetaPathways_v0.8.zip from http://hallam.microbiology.ubc.ca/MetaPathways/ appropriate for your operating system. This should work on any unix-based 64-bit operating system. Aer you have downloaded the file, unzip and inspect the contents of the MetaPathways/ folder (Figure 1).

Figure 1 — An example of the MetaPathways/ folder from the MetaPathways_v0.8.zip file. Notice that the folder has a number of different files and folders inside it. The template configuration (template_config.txt) and parameter configuration (template_param.txt) files are used to configure and set parameter settings of each of the analytical steps of the pipeline. Additionally, the Python script, MetaPathways.py, is used to start the pipeline.

A Tour of the MetaPathways/ folder:

blastDB/ - place where BLAST databases are stored along with name-mapping and taxonomic support files for specific databases like KEGG and COG

daemon.py - a script that carries out external operations on supercomputing grids using the Sun Grid engine, if available

executables/ — contains various analytical and data handling programs that process the inputs and outputs of different steps of the pipeline e.g. BLAST, Prodigal, trna-scan, etc.

libs/ — the code library folder contains different Perl and Python functions and code that coordinate different steps of the pipeline

MetaPathways.py — the starter script/program that runs the pipeline with specific configuration and parameter settings for each of the steps

MetaPathwaysrc — a unix source file that ensures that the computer system knows where the MetaPathways/ folder, sets the local python and perl paths, and compiles some executable code

template_config.txt — a configuration file that specifies the location of different programming resources on the computer. e.g. the Location of BLAST databases, Perl, Python, etc.

Page 2: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

template_header.txt — a template header for GenBank (.gbk) files

template_param.txt — a parameter file that specifies the analytical settings for all pipeline steps. e.g. BLAST cut-offs, steps to include in a run of the analysis, what order to annotate databases in, etc.

testdata/ — contains some simple .fasta files to do a dry-run to ensure that everything in the pipeline is working properly

For simplicity we are going to perform this installation out of the user home folder /User/[username]/ by default. In Unix commands the tilde ~ character is equivalent to your home directory. In OSX systems the home folder can be found by any of the following methods:

•Double-click the "Macintosh HD" on the Desktop

•Right-click (control-click) the "Finder" icon in the Dock and select "New Finder Window"

•Le-click the "Finder" icon and press: (command + n)

•Click home symbol in the le-hand panel of the new window. This should be named aer your username.

•Drag and drop the newly extracted MetaPathways/ folder into the home directory. It should sit as ~/MetaPathways/ when accessing it through the terminal.

Note: MetaPathways requires the use of the unix terminal to issue commands. On OSX systems this is done through the Terminal program located in:

•Applications > Utilities > Terminal

•Double-click to open

•You may want to drag the Terminal program to your Dock for convenience

2. Installing programming languages Python, Perl, and GCC. Install the required Python 2.x, Perl 5.x, and C. For OSX users these are all contained within the current release of Xcode4 which can be obtained for free at https://developer.apple.com/xcode/ or on the Apple App Sore within modern releases of OSX.

Note: Apple Xcode installs a large number of programming languages and related packages which makes it a large download (approx. 2GB).

Alternatively, Perl and Python installation files and documentation can be obtained from their respective websites:

Python 2.x - http://docs.python.org/using/unix.html

Perl 5.x - http://www.perl.org/get.html

GCC - http://gcc.gnu.org

These also can be obtained through a package management system like Synaptic (http://www.nongnu.org/synaptic/).

Page 3: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

Note: In many instances this may require some discussions with your local system administrator in an academic setting. A restart of the computer might also be required aer installation of any new programming language. It is also a good idea to open the terminal aer installation to check if these installations made it to your system’s $PATH variable using the which command.

# tests to see if perl is included in your Unix $PATH variable

$ which perl

/usr/bin/perl

$ which python

/usr/bin/python

$ which gcc

/Developer/usr/bin/gcc

3. Pathway Tools. One of the final steps of the MetaPathways pipeline uses the soware Pathway Tools to build a Pathway/Genome Database (PGDB) from your metagenomic sample. The Pathway Tools 16.0 soware can be obtained directly from SRI International and will require obtaining an academic licence for the soware (http://biocyc.org/download.shtml). This is free for academic users and usually takes approximately 1-2 business days to approve. Problems with licensing can be emailed to [email protected]. SRI International provides installation instructions for OSX and Unix, and is extensively documented at its homepage: http://bioinformatics.ai.sri.com/ptools/.

In short, you will obtain an install file like pathway-tools-16.0-macosx-tier1-install.dmg and upon mounting this folder to the desktop a folder with a file that starts an installation wizard (Figure 2).

Page 4: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

Figure 2 — The Pathway Tools 16.0 install wizard for OSX. We recommend that installation defaults are followed, placing the pathway-tools/ and ptools-local/ directories in their default location of the user’s home folder.

For ease of instruction we encourage the use of the default installation locations of Pathway Tools directories in the standard home folder locations: ~/pathway-tools and ~/ptools-local.

~/pathway-tools — contains the actual Pathway Tools soware

~/ptools-local — contains the PGDBs once they have been built

Aer installing Pathway Tools you can test to see if it runs by launching it from the Terminal:

# running pathway tools

$ cd ~

$ ./pathway-tools/pathway-tools

Note: There is a possibility that Pathway Tools will not launch because X11 is not installed. This is included on your original OSX install CD. On recent versions of OSX (Snow Lion 10.8.x) X11 has been dropped in favor of XQuatrz http://xquartz.macosforge.org/landing/. On Pathway Tools version 16.5 is compatible with this change.

Page 5: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

4. BLAST Databases. The Basic Local Alignment Search Tool (BLAST) is used for a number of pipeline steps; specifically the Open Reading Frame (ORF) functional annotation and the taxonomic identification of sequences through rRNA homology. In order to perform this step locally you need a copy of the databases on your computer. We only provide the MetaCyc database (metacyc-v5-2011-10-21) which is the same as a file coupled with the Pathway Tools soware (uniprot-seq-ids.seq), just reformatted into the common .fasta format.

However, the choice of database oen depends on the specific scientific question you are asking. As such, many databases are freely maintained for download from public p servers.

Note: These databases are large and they grow in size every day. Downloads add into the gigabytes (GBs) so a high-speed internet connection will be required. Also many of these are hosted on file transfer protocol (p) servers, we recommend Cyberduck http://cyberduck.ch as a free, simple, and user-friendly p client.

The follow are databases that are compatible with MetaPathways:

Protein Databases

RefSeq — a major protein reference database maintained by the National Center of Biotechnology Information (NCBI) http://www.ncbi.nlm.nih.gov/RefSeq/• connect to the BLAST database p server p://p.ncbi.nlm.nih.gov/blast/db

• download the set of files named refseq_protein.XX.tar.gz, where XX are numbers

• extract the .tar.gz archives (usually by simply double-clicking on them)Note: RefSeq comes as a pre-compiled BLAST database in a number of separate folders aer extraction from their tar.gz archives You may find the following helpful to quickly transfer these to the MetaPathways/blastDB folder:cd Downloads/mv refseq_protein.0*/* ~/MetaPathways/blastDB/

COG — Clusters of Orthologous Groups of proteins (COGs) which is also maintained by the NCBI http://www.ncbi.nlm.nih.gov/COG/• connect to the COG p server p://p.ncbi.nih.gov/pub/COG/COG/• download the file myva which is a .fasta file containing the sequences

KEGG — The Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/, http://www.bioinformatics.jp/en/keggp.html

Note: As of writing KEGG has requires a subscription fee.

Nucleotide Taxonomic Databases

Silva — comprehensive ribosomal database project http://www.arb-silva.de/download/• navigate to Download > Archive > Current > Exports• download the current SSU database (SSURef_111_NR_tax_silva.fasta.tgz) and the current LSU database (LSURef_111_tax_silva.fasta.tgz)

GreeneGenes — 16S rRNA gene database and workbench compatible with ARB http://greengenes.lbl.gov/cgi-bin/nph-index.cgi

•navigate the links: Download > Sequence Data > Fasta_data_files

Page 6: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

•download current_GREENGENES_gg16S_unaligned.fasta.gz

Note: one need only download the databases in .fasta format in place them in the blastDB/ folder. MetaPathways is programed to do automatic formatting of them on-the-fly.

5. Configuring the template_config.txt. The template_config.txt file configures the pipeline to find the resources it needs to run. Paths will have to be set for the PERL_EXECUTABLE, PYTHON_EXECUTABLE, PATHOLOGIC_EXECUTABLE, and METAPATHWAYS_PATH.

• direct the Terminal to the MetaPathways folder and source the MetaPathwaysrc file compiling the Perl and Python code and locating Perl, Python and the MetaPathways directory for the config file:$ source MetaPathwaysrc

Checking for Python and Perl:

Python found in /usr/bin/python

Please set variable PYTHON_EXECUTABLE in file template_config.txt as:

PYTHON_EXECUTABLE /usr/bin/python

Perl found in /usr/bin/perl

Please set variable PERL_EXECUTABLE in file template_config.txt as:

PERL_EXECUTABLE /usr/bin/perl

Adding installation folder of MetaPathways to PYTHONPATH

Your MetaPathways is installed in :

Please set variable METAPATHWAYS_PATH in file template_config.txt as:

METAPATHWAYS_PATH /Users/username/MetaPathways

• follow the on-screen instructions and update the PERL_EXECUTABLE, PYTHON_EXECUTABLE, PATHOLOGIC_EXECUTABLE, and METAPATHWAYS_PATH variables in template_config.txt (Figure 3):

Page 7: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

Figure 3 — An example of the template_config.txt file containing all the configuration settings that tells MetaPathways where all its resources reside on a specific computer. Run source MetaPathwaysrc in the MetaPathways directory to find the correct entries for PYTHON_EXECUTABLE, PERL_EXECUTABLE, and METAPATHWAYS_PATH. Type in the full path of the location where you installed Pathway Tools under the PATHOLOGIC_EXECUTABLE variable.

6. Setting up the template_param.txt. The template_param.txt file needs to be updated with the exact names of your protein and nucleotide databases in the blastDB folder (Figure 4).

Page 8: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

Figure 4 — The template_param.txt folder. The exact names of the BLAST databases need to be listed in the above highlighted lines. These must be the exact names of the files in the blastDB/ folder. This will replace the generic metacyc, kegg, cog, etc. above. For instance metacyc needs to be replaced with metacyc-v5-2011-10-21, which should have been included in the original download of MetaPathways in the blastDB/ directory .

7. Connecting with the Grid (optional). MetaPathways has capability to externalize computationally heavy tasks like protein BLAST searches to super computing facilities, provided they use the Sun Grid Engine. This is an optional, but highly recommended step. However this requires having ssh access and sufficient user permissions to set up password-less on a super computing server. This might be a good time to check with your local system administrator and ask if this kind of setup is permissible.

• test to see if you can connect to your account via ssh:$ ssh [email protected]

Page 9: MetaPathways Installation v0 - Hallam Labhallam.microbiology.ubc.ca/MetaPathways/resources/... · MetaPathways/ folder, sets the local python and perl paths, and compiles some executable

You should be asked for your password.• you should check to see there is a .ssh/ folder in your home directory$ ls ~/.ssh/authorized_keys known_hosts• if not you should create it:$ mkdir ~/.ssh/• press control + d to return to your local computer• navigate to the ~/.ssh/ directory$ cd ~/.ssh/• run ssh-key to create a RSA public and private key.$ ssh-keygen -t rsaGenerating public/private rsa key pair.

Enter file in which to save the key (/Users/username/.ssh/id_rsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in id_rsa.

Your public key has been saved in id_rsa.pub.• Copy your public key to your grid .ssh/ folder with scp$ scp id_rsa.pub [email protected]:~/.ssh/• Log back in to your external server account using ssh$ ssh [email protected]• Navigate to the ~/.ssh/ directory again$ cd ~/.ssh• append the public key to a file called authorized_keys$ cat id_rsa.pub >> authorized_keys• change the permissions of the authorized_keys file and .ssh directory such that only your username can read/write it$ chmod 600 ~/.ssh/authorized_keys$ chmod 700 ~/.ssh/ • logout to your local computer pressing control + d• again try to login using ssh, you should not need to type in your password this time$ ssh [email protected] this above procedure did not help then you likely have a more complicated setup on your hands. At this point it would be good to speak with a local system administrator to help you setup keyless login. If this is not possible, a Google term would be “ssh keyless login”

Congratulations! You have completed an involved MetaPathways setup, but with some luck the MetaPathways pipeline ready for action. Proceed to the Examples and Use Cases page for some ideas of trying this out.