programming and perl for bioinformatics part i

21
Programming and Perl Programming and Perl for for Bioinformatics Bioinformatics Part I Part I

Upload: shepry

Post on 24-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Programming and Perl for Bioinformatics Part I. A Taste of Perl: print a message. perltaste.pl: Greet the entire world. #!/usr/bin/perl #greet the entire world $x = 6e9; print “Hello world!\n”; print “All $x of you!\n”;. - command interpretation header. - a comment. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Programming and Perl for  Bioinformatics Part I

Programming and PerlProgramming and Perlfor for

BioinformaticsBioinformaticsPart IPart I

Page 2: Programming and Perl for  Bioinformatics Part I

A Taste of Perl: print a A Taste of Perl: print a messagemessage

perltaste.pl: Greet the entire world.

#!/usr/bin/perl

#greet the entire world

$x = 6e9;

print “Hello world!\n”;

print “All $x of you!\n”; }- function calls(output statements)

- command interpretation header

- variable assignment statement

- a comment

Page 3: Programming and Perl for  Bioinformatics Part I

Basic Syntax and Data Basic Syntax and Data TypesTypes

whitespacewhitespace doesn’t matter to Perl. One doesn’t matter to Perl. One can write all statements on one linecan write all statements on one line

All Perl statements end in a semicolon All Perl statements end in a semicolon ;; just like Cjust like C

Comments begin with ‘Comments begin with ‘##’ and Perl ignores ’ and Perl ignores everything after the # until end of line.everything after the # until end of line. Example: #this is a commentExample: #this is a comment

Perl has Perl has three basic data typesthree basic data types:: scalarscalar array (list)array (list) associative array (hash)associative array (hash)

Page 4: Programming and Perl for  Bioinformatics Part I

ScalarsScalars

Scalar variablesScalar variables begin with ‘ begin with ‘$$’ followed by ’ followed by an identifieran identifier Example: $this_is_a_scalar;Example: $this_is_a_scalar;

An An identifieridentifier is composed of upper or lower is composed of upper or lower case case lettersletters, , numbersnumbers, and , and underscoreunderscore '_'. '_'. Identifiers are case sensitive (like all of Perl)Identifiers are case sensitive (like all of Perl)

$progname = “first_perl”; $progname = “first_perl”; $numOfStudents = 4;$numOfStudents = 4; = sets the content of $progname to be the string = sets the content of $progname to be the string

“first_perl” & $numOfStudents to be the integer 4“first_perl” & $numOfStudents to be the integer 4

Page 5: Programming and Perl for  Bioinformatics Part I

Scalar ValuesScalar Values

Numerical ValuesNumerical Values integer:integer: 5, “3”, 0, -307 5, “3”, 0, -307 floating point: 6.2e9, -4022.33floating point: 6.2e9, -4022.33 hexadecimal/octal:hexadecimal/octal: 0x0xd4f, d4f, 00477477 Binary: Binary: 0b011011 0b011011

NOTE: NOTE: allall numerical values stored as numerical values stored as floating-point numbers (“double” floating-point numbers (“double” precision)precision)

Page 6: Programming and Perl for  Bioinformatics Part I

Do the MathDo the Math Mathematical functions work pretty much Mathematical functions work pretty much

as you would expect:as you would expect:4+74+76*46*443-2743-27256/12256/122/(3-5)2/(3-5)

ExampleExample#!/usr/bin/perl#!/usr/bin/perlprint "4+5\n";print "4+5\n";print 4+5 , "\n";print 4+5 , "\n";print "4+5=" , 4+5 , "\n";print "4+5=" , 4+5 , "\n";$myNumber = 88;$myNumber = 88;

Note: use commas to separate multiple items in a Note: use commas to separate multiple items in a printprint statementstatement

What will be the output?What will be the output?

4+594+5=9

Page 7: Programming and Perl for  Bioinformatics Part I

Scalar ValuesScalar Values String valuesString values Example:Example:

$day = "Monday ";print "Happy Monday!\n";print "Happy $day!\n";print 'Happy Monday!\n';print 'Happy $day!\n';

Double-quoted: interpolates (Double-quoted: interpolates (replaces variable replaces variable name/control character with it’s valuename/control character with it’s value) )

Single-quoted: Single-quoted: nono interpolation done (as-is) interpolation done (as-is)

Happy Monday!<newline>

Happy Monday!\n

Happy Monday!<newline>

Happy $day!\n

What will be the output?What will be the output?

Page 8: Programming and Perl for  Bioinformatics Part I

String ManipulationString Manipulation

ConcatenationConcatenation$dna1 = “ACTGCGTAGC”;$dna1 = “ACTGCGTAGC”;

$dna2 = “CTTGCTAT”;$dna2 = “CTTGCTAT”;

juxtapose in a string assignment or print juxtapose in a string assignment or print statementstatement

$new_dna = “$dna1$dna2”;$new_dna = “$dna1$dna2”;

Use the Use the concatenation operatorconcatenation operator ‘ ‘..’’

$new_dna = $dna1 $new_dna = $dna1 . $dna2; $dna2;

SubstringSubstring$dna = “ACTGCGTAGC”;$dna = “ACTGCGTAGC”;

$exon1 = substr($dna,2,5); $exon1 = substr($dna,2,5);

0 2

# TGCGT# TGCGT

Length of the substring

Page 9: Programming and Perl for  Bioinformatics Part I

SubstitutionSubstitutionDNA transcription: T DNA transcription: T U U

Substitution operator Substitution operator s///s/// : :$dna = “GATTACATACACTGTTCA”;$dna = “GATTACATACACTGTTCA”;$rna = $dna;$rna = $dna;$rna $rna =~=~ s/s/TT//UU//gg; ;

#“GAUUACAUACACUGUUCA”#“GAUUACAUACACUGUUCA”

=~=~ is a binding operator indicating to exam the is a binding operator indicating to exam the contents of $contents of $rnarna for a match pattern for a match pattern

Ex:Ex: Start with Start with $dna =“gaTtACataCACTgttca”;$dna =“gaTtACataCACTgttca”;

and do the same as above. What will be the and do the same as above. What will be the output?output?

Page 10: Programming and Perl for  Bioinformatics Part I

ExampleExample transcribe.pl:transcribe.pl:

$dna ="gaTtACataCACTgttca";

$rna = $dna;

$rna =~ s/T/U/g;

print "DNA: $dna\n";

print "RNA: $rna\n";

Does it do what you expect? If not, why not? Patterns in substitution are case-sensitive! What can

we do? Convert all letters to upper/lower case (preferred

when possible) If we want to retain mixed case, use

transliteration/translation operator tr///$rna =~ tr/tT/uU/; #replace all t by u, all T by U

Page 11: Programming and Perl for  Bioinformatics Part I

Case conversionCase conversion$string = “acCGtGcaTGc”;$string = “acCGtGcaTGc”;Upper case:Upper case:

$dna = uc($string);$dna = uc($string); # “ACCGTGCATGC”# “ACCGTGCATGC”

oror $dna = uc $string;$dna = uc $string;

oror $dna = “\U$string”;$dna = “\U$string”;

Lower case:Lower case:

$dna = lc($string);$dna = lc($string); # “accgtgcatgc”# “accgtgcatgc”

oror $dna = “\L$string”;$dna = “\L$string”;

Sentence case:Sentence case:

$dna = ucfirst($string) $dna = ucfirst($string) # “Accgtgcatgc”# “Accgtgcatgc”

oror $dna = “\u\L$string”;$dna = “\u\L$string”;

Page 12: Programming and Perl for  Bioinformatics Part I

Reverse ComplementReverse Complement

5’-5’- A C G T C T A G C A C G T C T A G C . . . .. . . . G C A T G C A T -3’-3’

3’-3’- T G C A G A T C G T G C A G A T C G . . . .. . . . C G T A C G T A -5’-5’

ReverseReverse: reverses a string: reverses a string$string = "ACGTCTAGC";$string = "ACGTCTAGC";

$string = reverse($string);$string = reverse($string); "CGATCTGCA“"CGATCTGCA“

ComplementationComplementation: use transliteration : use transliteration operatoroperator$string =~ tr/ACGT/TGCA/;$string =~ tr/ACGT/TGCA/;

Page 13: Programming and Perl for  Bioinformatics Part I

More on String More on String ManipulationManipulation

String length:String length:length($dna)length($dna)

Index:Index:##index STR,SUBSTR,POSITIONindex STR,SUBSTR,POSITION index($strand, $primer, 2)index($strand, $primer, 2)

optionaloptional

Page 14: Programming and Perl for  Bioinformatics Part I

Flow ControlFlow ControlConditional StatementsConditional Statements

parts of code executed depending on truth value parts of code executed depending on truth value of a logical statementof a logical statement

““truth” (logical) values in Perl:truth” (logical) values in Perl:false = {0, 0.0, 0e0, “”, undef}, default false = {0, 0.0, 0e0, “”, undef}, default “”“”

truetrue = anything else, default = anything else, default 11

($a, $b) = (75, 83);($a, $b) = (75, 83);

if ( $a < $b ) {if ( $a < $b ) {

$a = $b;$a = $b;

print “Now a = b!\n”;print “Now a = b!\n”;

} }

if ( $a > $b ) { print “Yes, a > b!\n” }if ( $a > $b ) { print “Yes, a > b!\n” } # Compact# Compact

Page 15: Programming and Perl for  Bioinformatics Part I

Comparison OperatorsComparison Operators

ComparisonComparison StringString NumberNumber

EqualityEquality eqeq ====

InequalityInequality nene !=!=

Greater thanGreater than gtgt >>

Greater than or Greater than or equal toequal to

gege >=>=

Less thanLess than ltlt <<

Less than or equal Less than or equal toto

return 1/nullreturn 1/null

lele <=<=

Comparison:Comparison:

Returns -1, 0, 1Returns -1, 0, 1cmpcmp <=><=>

Page 16: Programming and Perl for  Bioinformatics Part I

Logical OperatorsLogical Operators

OperationOperation ComputeresComputeresee

English English versionversion

ANDAND &&&& andand

OROR |||| oror

NOTNOT !! notnot

Page 17: Programming and Perl for  Bioinformatics Part I

if/else/elsifif/else/elsif

allows for multiple allows for multiple branching/outcomesbranching/outcomes$a = rand();$a = rand();

ifif ( $a <0.25 ) { ( $a <0.25 ) {print “A”;print “A”;

}}

elsifelsif ($a <0.50 ) { ($a <0.50 ) {print “C”;print “C”;

}}

elsifelsif ( $a < 0.75 ) { ( $a < 0.75 ) {print “G”;print “G”;

}}

elseelse { {print “T”; print “T”;

}}

Page 18: Programming and Perl for  Bioinformatics Part I

Conditional LoopsConditional Loops

whilewhile ( ( statement statement ) {) { commands … commands … }} repeats repeats commandscommands until until statementstatement is no is no

longer truelonger true

dodo { { commandscommands } } whilewhile ( ( statementstatement ); ); same as same as whilewhile, except , except commandscommands executed as least executed as least

onceonce NOTENOTE the ‘ the ‘;;’ after the while statement!!’ after the while statement!!

Short-circuiting commands: Short-circuiting commands: nextnext and and lastlast

next;next; #jumps to end, do next iteration#jumps to end, do next iteration last;last; #jumps out of the loop completely #jumps out of the loop completely

Page 19: Programming and Perl for  Bioinformatics Part I

whilewhile

Example:Example:

while ($alive) {while ($alive) {

if ($needs_nutrients) {if ($needs_nutrients) {

print “Cell needs nutrients\n”;print “Cell needs nutrients\n”;

}}

}}

Any problem?Any problem?

Page 20: Programming and Perl for  Bioinformatics Part I

for and foreach loopsfor and foreach loops Execute a code loop a specified number of Execute a code loop a specified number of

times, or for a specified list of valuestimes, or for a specified list of values forfor and and foreachforeach are identical: use are identical: use

whichever you wantwhichever you want

Incremental loop (“C style”):Incremental loop (“C style”):for ( $i=0 ; $i < 50 ; $i++ ) {for ( $i=0 ; $i < 50 ; $i++ ) {

$x = $i*$i;$x = $i*$i;

print "$i squared is $x.\n";print "$i squared is $x.\n";

}}

Loop over list (“Loop over list (“foreachforeach” loop):” loop): foreach $name ( "Billy", "Bob", "Edwina" ) {foreach $name ( "Billy", "Bob", "Edwina" ) {

print "$name is my friend.\n";print "$name is my friend.\n";

}}

Page 21: Programming and Perl for  Bioinformatics Part I

Basic Data TypesBasic Data Types

Perl has Perl has three basic data three basic data typestypes::scalarscalararray (list)array (list)associative array (hash)associative array (hash)