ni mble perl programming using scriptome
DESCRIPTION
Ni mble Perl Programming Using Scriptome. Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/22/2009. Objectives. Determining whether Scriptome can … Enable you to perform operations otherwise difficult/time-consuming/error-prone? - PowerPoint PPT PresentationTRANSCRIPT
Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu
Nimble Perl Programming Using Scriptome
Yannick Pouliot, PhDBioresearch Informationist
Lane Medical Library & Knowledge Management Center
1/22/2009
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
2
ObjectivesDetermining whether Scriptome can …
1. Enable you to perform operations otherwise difficult/time-consuming/error-prone?
2. Help you learn Perl?
And don’t worry: This experiment won’t hurt a bit!
Also, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
3
So What Is Scriptome?
Scriptome is a resident Perl program that performs various data manipulation tasks useful to biologists
Originally developed by Harvard’s FAS Center for Systems Biology Maintained and extended by lots more volunteers
not associated with Harvard
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
4
Why Bother With Scriptome? Code is visible, enabling learning on how to
do things in Perl … or not Can handle arbitrarily large files
No size limitations, e.g., Excel Free; runs on everything: PC, Mac, Linux It’s programmatic!
Much faster than manual operations You can string operations together and save
these in e.g. a .bat file
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
5
How Do You Use Scriptome? You tell Scriptome which function you want it to
perform (more later) You can also string Scriptome functions into a
protocol Input: Scriptome operates on text files
No binary files, but you could add that capability yourself E.g., process Excel files in native form using Perl modules,
e.g., ParseExcel
Output: command line or write into another file
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
6
Scriptome: Pick Your Flavor
http://sysbio.harvard.edu/csb/resources/computational/scriptome/
http://lane.stanford.edu/howto/index.html?id=_1257
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
7
Installing Scriptome - Windows1. Download Scriptome_exe.tar.gz using this link:
http://sysbio.harvard.edu/csb/resources/computational/scriptome/bin/Scriptome_exe.tar.gz.
→ Final location: I suggest C:/Program Files/Scriptome
2. Create a directory named “Scriptome”3. Decompress Scriptome_exe.tar.gz by double-clicking
→ Notice the four files inside
3. Update the PATH variableadd this string at the END of the contents of the PATH variable:
;C:\Program Files\Scriptome\Scriptome;C:\Program Files\Scriptome\ScriptPack;C:\Program Files\Scriptome\Scriptome.bat;C:\Program Files\Scriptome\ScriptPack.bat
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
8
Scriptome Usage1. Using a specific tool:
Scriptome flags toolname [input_filenames] [> output_filename]
Example Scriptome -t change_fasta_to_tab LONGhmcad.fst
2. Finding a tool by type:Scriptome -t tooltype
where tooltype = Calc Choose Sort Fetch Merge Change
Example Scriptome -t Calc
Let’s examine each area briefly before going over specifics…
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
9
Polling Time: How’s the speed?
1: Too fast
2. Too slow
3. More or less OK
4. I feel nauseous
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
10
Examples and noteworthy tools
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
11
Calc Tool Examples - 1
Compute column sums: Scriptome -t calc_col_sum SubjectData1.tab
→ select columns to add
IMPORTANT: column numbers start at 0, not 1 Note visible Perl code → easy to modify,
expand perl -e "$col=1; while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum += $F[$col];}warn qq~\nSum of column $col for $. lines\n\n~;print qq~$sum\n~" file.tab
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
12
Calc Tool Examples - 2
Compute row sums: Scriptome -t calc_row_sum
SubjectData1.tab
→ enter 1 for column 1, 2 for column 2, etc perl -e "
@cols=(1, 2, 3); while(<>) { s/\r?\n//; @F=split /\t/, $_; $sum = 0; foreach $col (@cols) {
$sum += $F[$col] }; print qq~$_\t$sum\n~;}warn qq~\nSum of columns @cols for each line ($. lines)\n\n~" in.tab
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
13
Change Tool Examples - 1
Create tab-delimited file from FASTA file:
Scriptome -t change_fasta_to_tab LONGhmcad.fst > LONGhmcad.fst.tab
→ change_fasta_to_tab is an important tool because many Scriptome tools use tab-delimited files
perl -e "$count=0;$len=0;while(<>) { s/\r?\n//; s/\t/ /g; if (s/^>//) {
if ($. != 1) { print qq~\n~}s/ |$/\t/;$count++;$_ .= qq~\t~;
} else {
s/ //g;$len += length($_)
} print $_;}print qq~\n~;warn qq~\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n~;" seqs.fna
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
14
Change Tool Examples - 2
Change rows to columns or vice versa:
Scriptome -t change_transpose_table SubjectData1.tab
Note: change_transpose_table operates on tab-delimited files
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
15
Change Tool Examples - 3
Create tab-delimited file from FASTA file:
Scriptome -t change_bio_format_to_bio_format LONGhmcad.fst enter ‘fasta’ as input format (no quotes)enter ‘genbank’ as output format (no quotes)
change_bio_format_to_bio_format addresses the common problem of converting formats
Important: requires Bioperl to be installed
perl -MBio::SeqIO -e "$informat= qq~genbank~;$outformat= qq~fasta~; $count = 0;for $infile (@ARGV) { $in = Bio::SeqIO->newFh(-file => $infile , -format => $informat); $out = Bio::SeqIO->newFh(-format => $outformat); while (<$in>) {
print $out $_;$count++;
}}warn qq~Translated $count sequences from $informat to $outformat format\n~" myseqs.genbank > myseqs.fasta
* Notice anything interesting? *
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
16
ConclusionsScriptome is … A good solution for manipulating medium to
large data files quickly and reliably A way to learn Perl in a “real” context (no toy
problems) Able to perform a wide range of tasks, from
simple, generic file manipulations to bio-specific complex tasks
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
17
Resources For Perl help, see resources in workshop
description in Lane’s Perl Programming for Biologists
Some recommended titles:
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
18
Polling Time: Do you think Scriptome will be useful to your research?
1. Definitely
2. Likely
3. Not likely
4. No way
5. What’s the question again?
Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu