perl programming for biologists, second edition part 1: 9/11/2007
Post on 16-Jan-2016
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu
Perl Programming for Biologists, Second EditionPart 1: 9/11/2007
Yannick Pouliot, PhDBioresearch Informationist
Lane Medical Library & Knowledge Management Center
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
2
Class Requirements You must
have wireless access have the admin password to your machine
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
3
To Do
Please download all class materials fromhttp://lane.stanford.edu/howto/index.html?id=_2796
into C:\course
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
4
Class Focus
1. Creating, writing and reading Excel files2. Reformatting data files for input to an
analysis program3. Writing and reading from a database such
as MS Access or other locally installed relational database, as well as from databases available on the Internet
And remember: Ask LOTS OF QUESTIONS
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
5
Cautions
All examples pertain to MS Office 2003 Examples still work in MS Office 2007 However, Perl modules used here do not work
with MS Office 2007-formatted documents All examples pertain to Perl 5.x, not 6.x
V.5 and 6 are NOT compatible V.5 is far more common, so not much of an issue
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
6
So Why Perl? Perl = Practical Extraction and Reporting Language Free Very widely used
Especially in biological community Very flexible and portable Not the only language of this type
E.g., Python Not the absolute easiest
… but pretty easy Not suited for everything
E.g., for ultra-fast mathematically-oriented code, C is still best
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
7
Today’s session:
- Installing and understanding what is required to run Perl
- Understanding the basics of a Perl program
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
8
Part 1: Installation
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
9
Components to Install & Configure1. Perl itself
More accurately, the Perl interpreter We’ll use ActiveState Perl 5.8x (ActivePerl)
www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca
2. Additional Perl modules Module = extra functions not part of the interpreter Described at Comprehensive Perl Archive Network (CPAN)
3. Open Perl IDE IDE = integrated development environment:
Editor to write/edit your program Debugger to find bugs A compiler/interpreter to run your program from within the IDE
sourceforge.net/project/showfiles.php?group_id=23334&release_id=914404. Configuring the ODBC manager (next week)
Part of Windows Allows different programs to interact with databases on your machine or
anywhere on the Web via single “doorway”
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
10
What is an Interpreter?
= A program that translates an instruction into the computer’s language and executes it before proceeding to the next instruction = compiled and executed once instruction at a
time Perl is usually used in interpreted mode
Can also be compiled once (= faster)
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
11
Installing Perl from ActiveState
1. Go to www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca
We’ll be downloading Perl 5.8.x.x:
1. Select Windows MSI package for Windows X86
2. Run the installer
3. Install under c:\Perl
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
12
Installing Additional Perl ModulesThe fountain of all things Perl: CPAN
= Comprehensive Perl Archive Network http://www.cpan.org/
What does a module look like?
Why modules?
PPM for downloading & installing modules
What modules are in MY Perl?
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
13
Perl Modules We’ll Be Using
Name Function
Included File::Copy manipulating files
Included File::Find manipulating files
Included File::Path manipulating files
You do it! File::Rename Manipulating files
Included IO::File accessing the insides of files
Included Spreadsheet::WriteExcel writing into an MS Excel spreadsheet
Included Spreadsheet::ParseExcel parsing an MS Excel spreadsheet
Included Spreadsheet::BasicRead reading the contents of an MS Excel spreadsheet
Included Win32::OLE provides easy access to Windows (e.g., launching Excel)
Included DBI provides access to relational databases
Included DBD::ODBC provides access to relational databases
Included URI accessing URLs
Included LWP::Simple interacting with a Web site via http
Included Array::Unique returns unique elements of an array
Included List::Uniq returns unique elements of a list
Included Data :: Dumper dumping data out of a data structure
Included Switch switch function ("multiple if-else-then")
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
14
The PPM Module: Installing Perl Modules the Easy Way
Perl modules can downloaded and installed manually from CPAN (hard)
They can also be installed via the Perl Package Manager: PPM (easy)
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
15
Installing an environment to run and edit Perl:
Integrated Development
Environment (IDE)
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
16
Why an IDE? IDE = integrated development environment:
Editor to write/edit your program Debugger to find bugs A “runner” (compiler/interpreter) to run your program from within the
IDE IDEs provide facilities to facilitate writing & debugging
E.g., automatic code highlighting We’ll use Open Perl IDE
Free, open source, portable sourceforge.net/project/showfiles.php?group_id=23334&release_id=
91440
IDE: Definition, description For our Mac friends: Affrus
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
17
Installing Open Perl IDE
1. Go to sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440and download the code
2. Create folder Program Files/OpenPerlIDE3. Unzip into Program Files/OpenPerlIDE4. Update Path (under System Properties,
Advanced, Environment Variables, System Variables)→ this makes it possible to run Open Perl IDE from anywhere on your machine…
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
18
BREAK
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
19
Part 2: What does it all do?
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
20
Example Short Program
1. Start Open Perl IDE
2. Load Simple1.pl
3. Run Simple1.pl
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
21
Learning by Example
Simple2.pl
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
22
Exploring Perl’s Major Language Elements
Norman Matloff’s introduction to Perl: http://heather.cs.ucdavis.edu/~matloff/Perl/PerlIntro.pdf
Perl language reference http://en.wikipedia.org/wiki/Perl#Data_types
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
23
Additional Key Books/Resources
Learning by example: Perl Cookbook Perl Programming for Biologists Perl Quick Reference Guide My favorite: Perl Quick Reference
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
24
Going Further: Programming Tips Plan your program
Write down how you intend to process the data in more-or-less plain language Goal: making sure that it really does make sense
Hacking doesn’t really pay…
Have documentation handy ActivePerl documentation (searchable) Perl language reference→ eBooks: help served on a silver platter Lane FAQs
When you’re stuck: Search the Web Google can answer almost any programming question
… though quality documentation is still best
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
25
Toying with Excel3.pl, a “real” program
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
26
Excel3.pl: Introducing Object Programming Purpose: From an Excel worksheet that lists public
identifiers for DNA sequences associated with genes, the program retrieves: UniGene cluster ID Gene symbol NCBI Gene ID … and writes the result into another Excel worksheet
Mix of procedural and object programming Relevant links:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=unigene
Entrez Utilities
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
27
Search UniGene for
cluster ID UniGene
Gene
ESearch
Sequence identifier
Retrieve UniGene description for that
cluster
Result ID
Search Gene with Gene
ESearch
Cluster ID
Result ID
Gene symbols & descriptions
Excel report
UniGeneESummary
GeneESummary
Retrieve Gene
description for that gene
write
Excel reportwrite
What Excel3.pl Does
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
28
Assignments
Look at code for Example3.pl Modify it, break it Write down at least one question so we can talk
about it next week
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
29
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
30
eBooks Rule
Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu
31
What Does A Module Look Like?
top related