perl programming for biologists, second edition part 1: 9/11/2007

31
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Perl Programming for Biologists, Second Edition Part 1: 9/11/2007 Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center

Upload: maree

Post on 16-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Perl Programming for Biologists, Second Edition Part 1: 9/11/2007. Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center. Class Requirements. You must have wireless access have the admin password to your machine. To Do. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu

Perl Programming for Biologists, Second EditionPart 1: 9/11/2007

Yannick Pouliot, PhDBioresearch Informationist

Lane Medical Library & Knowledge Management Center

Page 2: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

2

Class Requirements You must

have wireless access have the admin password to your machine

Page 3: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

3

To Do

Please download all class materials fromhttp://lane.stanford.edu/howto/index.html?id=_2796

into C:\course

Page 4: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

4

Class Focus

1. Creating, writing and reading Excel files2. Reformatting data files for input to an

analysis program3. Writing and reading from a database such

as MS Access or other locally installed relational database, as well as from databases available on the Internet

And remember: Ask LOTS OF QUESTIONS

Page 5: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

5

Cautions

All examples pertain to MS Office 2003 Examples still work in MS Office 2007 However, Perl modules used here do not work

with MS Office 2007-formatted documents All examples pertain to Perl 5.x, not 6.x

V.5 and 6 are NOT compatible V.5 is far more common, so not much of an issue

Page 6: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

6

So Why Perl? Perl = Practical Extraction and Reporting Language Free Very widely used

Especially in biological community Very flexible and portable Not the only language of this type

E.g., Python Not the absolute easiest

… but pretty easy Not suited for everything

E.g., for ultra-fast mathematically-oriented code, C is still best

Page 7: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

7

Today’s session:

- Installing and understanding what is required to run Perl

- Understanding the basics of a Perl program

Page 8: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

8

Part 1: Installation

Page 9: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

9

Components to Install & Configure1. Perl itself

More accurately, the Perl interpreter We’ll use ActiveState Perl 5.8x (ActivePerl)

www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca

2. Additional Perl modules Module = extra functions not part of the interpreter Described at Comprehensive Perl Archive Network (CPAN)

3. Open Perl IDE IDE = integrated development environment:

Editor to write/edit your program Debugger to find bugs A compiler/interpreter to run your program from within the IDE

sourceforge.net/project/showfiles.php?group_id=23334&release_id=914404. Configuring the ODBC manager (next week)

Part of Windows Allows different programs to interact with databases on your machine or

anywhere on the Web via single “doorway”

Page 10: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

10

What is an Interpreter?

= A program that translates an instruction into the computer’s language and executes it before proceeding to the next instruction = compiled and executed once instruction at a

time Perl is usually used in interpreted mode

Can also be compiled once (= faster)

Page 11: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

11

Installing Perl from ActiveState

1. Go to www.activestate.com/store/freedownload.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca

We’ll be downloading Perl 5.8.x.x:

1. Select Windows MSI package for Windows X86

2. Run the installer

3. Install under c:\Perl

Page 12: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

12

Installing Additional Perl ModulesThe fountain of all things Perl: CPAN

= Comprehensive Perl Archive Network http://www.cpan.org/

What does a module look like?

Why modules?

PPM for downloading & installing modules

What modules are in MY Perl?

Page 13: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

13

Perl Modules We’ll Be Using

Name Function

Included File::Copy manipulating files

Included File::Find manipulating files

Included File::Path manipulating files

You do it! File::Rename Manipulating files

Included IO::File accessing the insides of files

Included Spreadsheet::WriteExcel writing into an MS Excel spreadsheet

Included Spreadsheet::ParseExcel parsing an MS Excel spreadsheet

Included Spreadsheet::BasicRead reading the contents of an MS Excel spreadsheet

Included Win32::OLE provides easy access to Windows (e.g., launching Excel)

Included DBI provides access to relational databases

Included DBD::ODBC provides access to relational databases

 Included URI accessing URLs

Included LWP::Simple interacting with a Web site via http

Included Array::Unique returns unique elements of an array

Included List::Uniq returns unique elements of a list

Included Data :: Dumper dumping data out of a data structure

Included Switch switch function ("multiple if-else-then")

Page 14: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

14

The PPM Module: Installing Perl Modules the Easy Way

Perl modules can downloaded and installed manually from CPAN (hard)

They can also be installed via the Perl Package Manager: PPM (easy)

Page 15: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

15

Installing an environment to run and edit Perl:

Integrated Development

Environment (IDE)

Page 16: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

16

Why an IDE? IDE = integrated development environment:

Editor to write/edit your program Debugger to find bugs A “runner” (compiler/interpreter) to run your program from within the

IDE IDEs provide facilities to facilitate writing & debugging

E.g., automatic code highlighting We’ll use Open Perl IDE

Free, open source, portable sourceforge.net/project/showfiles.php?group_id=23334&release_id=

91440

IDE: Definition, description For our Mac friends: Affrus

Page 17: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

17

Installing Open Perl IDE

1. Go to sourceforge.net/project/showfiles.php?group_id=23334&release_id=91440and download the code

2. Create folder Program Files/OpenPerlIDE3. Unzip into Program Files/OpenPerlIDE4. Update Path (under System Properties,

Advanced, Environment Variables, System Variables)→ this makes it possible to run Open Perl IDE from anywhere on your machine…

Page 18: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

18

BREAK

Page 19: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

19

Part 2: What does it all do?

Page 20: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

20

Example Short Program

1. Start Open Perl IDE

2. Load Simple1.pl

3. Run Simple1.pl

Page 21: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

21

Learning by Example

Simple2.pl

Page 22: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

22

Exploring Perl’s Major Language Elements

Norman Matloff’s introduction to Perl: http://heather.cs.ucdavis.edu/~matloff/Perl/PerlIntro.pdf

Perl language reference http://en.wikipedia.org/wiki/Perl#Data_types

Page 23: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

23

Additional Key Books/Resources

Learning by example: Perl Cookbook Perl Programming for Biologists Perl Quick Reference Guide My favorite: Perl Quick Reference

Page 24: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

24

Going Further: Programming Tips Plan your program

Write down how you intend to process the data in more-or-less plain language Goal: making sure that it really does make sense

Hacking doesn’t really pay…

Have documentation handy ActivePerl documentation (searchable) Perl language reference→ eBooks: help served on a silver platter Lane FAQs

When you’re stuck: Search the Web Google can answer almost any programming question

… though quality documentation is still best

Page 25: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

25

Toying with Excel3.pl, a “real” program

Page 26: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

26

Excel3.pl: Introducing Object Programming Purpose: From an Excel worksheet that lists public

identifiers for DNA sequences associated with genes, the program retrieves: UniGene cluster ID Gene symbol NCBI Gene ID … and writes the result into another Excel worksheet

Mix of procedural and object programming Relevant links:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene&orig_db=unigene

Entrez Utilities

Page 27: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

27

Search UniGene for

cluster ID UniGene

Gene

ESearch

Sequence identifier

Retrieve UniGene description for that

cluster

Result ID

Search Gene with Gene

ESearch

Cluster ID

Result ID

Gene symbols & descriptions

Excel report

UniGeneESummary

GeneESummary

Retrieve Gene

description for that gene

write

Excel reportwrite

What Excel3.pl Does

Page 28: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

28

Assignments

Look at code for Example3.pl Modify it, break it Write down at least one question so we can talk

about it next week

Page 29: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

29

Page 30: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

30

eBooks Rule

Page 31: Perl Programming for Biologists, Second Edition Part 1: 9/11/2007

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

31

What Does A Module Look Like?