programming and perl for bioinformatics part i
DESCRIPTION
Programming and Perl for Bioinformatics Part I. A Taste of Perl: print a message. perltaste.pl: Greet the entire world. #!/usr/bin/perl #greet the entire world $x = 6e9; print “Hello world!\n”; print “All $x of you!\n”;. - command interpretation header. - a comment. - PowerPoint PPT PresentationTRANSCRIPT
Programming and PerlProgramming and Perlfor for
BioinformaticsBioinformaticsPart IPart I
A Taste of Perl: print a A Taste of Perl: print a messagemessage
perltaste.pl: Greet the entire world.
#!/usr/bin/perl
#greet the entire world
$x = 6e9;
print “Hello world!\n”;
print “All $x of you!\n”; }- function calls(output statements)
- command interpretation header
- variable assignment statement
- a comment
Basic Syntax and Data Basic Syntax and Data TypesTypes
whitespacewhitespace doesn’t matter to Perl. One doesn’t matter to Perl. One can write all statements on one linecan write all statements on one line
All Perl statements end in a semicolon All Perl statements end in a semicolon ;; just like Cjust like C
Comments begin with ‘Comments begin with ‘##’ and Perl ignores ’ and Perl ignores everything after the # until end of line.everything after the # until end of line. Example: #this is a commentExample: #this is a comment
Perl has Perl has three basic data typesthree basic data types:: scalarscalar array (list)array (list) associative array (hash)associative array (hash)
ScalarsScalars
Scalar variablesScalar variables begin with ‘ begin with ‘$$’ followed by ’ followed by an identifieran identifier Example: $this_is_a_scalar;Example: $this_is_a_scalar;
An An identifieridentifier is composed of upper or lower is composed of upper or lower case case lettersletters, , numbersnumbers, and , and underscoreunderscore '_'. '_'. Identifiers are case sensitive (like all of Perl)Identifiers are case sensitive (like all of Perl)
$progname = “first_perl”; $progname = “first_perl”; $numOfStudents = 4;$numOfStudents = 4; = sets the content of $progname to be the string = sets the content of $progname to be the string
“first_perl” & $numOfStudents to be the integer 4“first_perl” & $numOfStudents to be the integer 4
Scalar ValuesScalar Values
Numerical ValuesNumerical Values integer:integer: 5, “3”, 0, -307 5, “3”, 0, -307 floating point: 6.2e9, -4022.33floating point: 6.2e9, -4022.33 hexadecimal/octal:hexadecimal/octal: 0x0xd4f, d4f, 00477477 Binary: Binary: 0b011011 0b011011
NOTE: NOTE: allall numerical values stored as numerical values stored as floating-point numbers (“double” floating-point numbers (“double” precision)precision)
Do the MathDo the Math Mathematical functions work pretty much Mathematical functions work pretty much
as you would expect:as you would expect:4+74+76*46*443-2743-27256/12256/122/(3-5)2/(3-5)
ExampleExample#!/usr/bin/perl#!/usr/bin/perlprint "4+5\n";print "4+5\n";print 4+5 , "\n";print 4+5 , "\n";print "4+5=" , 4+5 , "\n";print "4+5=" , 4+5 , "\n";$myNumber = 88;$myNumber = 88;
Note: use commas to separate multiple items in a Note: use commas to separate multiple items in a printprint statementstatement
What will be the output?What will be the output?
4+594+5=9
Scalar ValuesScalar Values String valuesString values Example:Example:
$day = "Monday ";print "Happy Monday!\n";print "Happy $day!\n";print 'Happy Monday!\n';print 'Happy $day!\n';
Double-quoted: interpolates (Double-quoted: interpolates (replaces variable replaces variable name/control character with it’s valuename/control character with it’s value) )
Single-quoted: Single-quoted: nono interpolation done (as-is) interpolation done (as-is)
Happy Monday!<newline>
Happy Monday!\n
Happy Monday!<newline>
Happy $day!\n
What will be the output?What will be the output?
String ManipulationString Manipulation
ConcatenationConcatenation$dna1 = “ACTGCGTAGC”;$dna1 = “ACTGCGTAGC”;
$dna2 = “CTTGCTAT”;$dna2 = “CTTGCTAT”;
juxtapose in a string assignment or print juxtapose in a string assignment or print statementstatement
$new_dna = “$dna1$dna2”;$new_dna = “$dna1$dna2”;
Use the Use the concatenation operatorconcatenation operator ‘ ‘..’’
$new_dna = $dna1 $new_dna = $dna1 . $dna2; $dna2;
SubstringSubstring$dna = “ACTGCGTAGC”;$dna = “ACTGCGTAGC”;
$exon1 = substr($dna,2,5); $exon1 = substr($dna,2,5);
0 2
# TGCGT# TGCGT
Length of the substring
SubstitutionSubstitutionDNA transcription: T DNA transcription: T U U
Substitution operator Substitution operator s///s/// : :$dna = “GATTACATACACTGTTCA”;$dna = “GATTACATACACTGTTCA”;$rna = $dna;$rna = $dna;$rna $rna =~=~ s/s/TT//UU//gg; ;
#“GAUUACAUACACUGUUCA”#“GAUUACAUACACUGUUCA”
=~=~ is a binding operator indicating to exam the is a binding operator indicating to exam the contents of $contents of $rnarna for a match pattern for a match pattern
Ex:Ex: Start with Start with $dna =“gaTtACataCACTgttca”;$dna =“gaTtACataCACTgttca”;
and do the same as above. What will be the and do the same as above. What will be the output?output?
ExampleExample transcribe.pl:transcribe.pl:
$dna ="gaTtACataCACTgttca";
$rna = $dna;
$rna =~ s/T/U/g;
print "DNA: $dna\n";
print "RNA: $rna\n";
Does it do what you expect? If not, why not? Patterns in substitution are case-sensitive! What can
we do? Convert all letters to upper/lower case (preferred
when possible) If we want to retain mixed case, use
transliteration/translation operator tr///$rna =~ tr/tT/uU/; #replace all t by u, all T by U
Case conversionCase conversion$string = “acCGtGcaTGc”;$string = “acCGtGcaTGc”;Upper case:Upper case:
$dna = uc($string);$dna = uc($string); # “ACCGTGCATGC”# “ACCGTGCATGC”
oror $dna = uc $string;$dna = uc $string;
oror $dna = “\U$string”;$dna = “\U$string”;
Lower case:Lower case:
$dna = lc($string);$dna = lc($string); # “accgtgcatgc”# “accgtgcatgc”
oror $dna = “\L$string”;$dna = “\L$string”;
Sentence case:Sentence case:
$dna = ucfirst($string) $dna = ucfirst($string) # “Accgtgcatgc”# “Accgtgcatgc”
oror $dna = “\u\L$string”;$dna = “\u\L$string”;
Reverse ComplementReverse Complement
5’-5’- A C G T C T A G C A C G T C T A G C . . . .. . . . G C A T G C A T -3’-3’
3’-3’- T G C A G A T C G T G C A G A T C G . . . .. . . . C G T A C G T A -5’-5’
ReverseReverse: reverses a string: reverses a string$string = "ACGTCTAGC";$string = "ACGTCTAGC";
$string = reverse($string);$string = reverse($string); "CGATCTGCA“"CGATCTGCA“
ComplementationComplementation: use transliteration : use transliteration operatoroperator$string =~ tr/ACGT/TGCA/;$string =~ tr/ACGT/TGCA/;
More on String More on String ManipulationManipulation
String length:String length:length($dna)length($dna)
Index:Index:##index STR,SUBSTR,POSITIONindex STR,SUBSTR,POSITION index($strand, $primer, 2)index($strand, $primer, 2)
optionaloptional
Flow ControlFlow ControlConditional StatementsConditional Statements
parts of code executed depending on truth value parts of code executed depending on truth value of a logical statementof a logical statement
““truth” (logical) values in Perl:truth” (logical) values in Perl:false = {0, 0.0, 0e0, “”, undef}, default false = {0, 0.0, 0e0, “”, undef}, default “”“”
truetrue = anything else, default = anything else, default 11
($a, $b) = (75, 83);($a, $b) = (75, 83);
if ( $a < $b ) {if ( $a < $b ) {
$a = $b;$a = $b;
print “Now a = b!\n”;print “Now a = b!\n”;
} }
if ( $a > $b ) { print “Yes, a > b!\n” }if ( $a > $b ) { print “Yes, a > b!\n” } # Compact# Compact
Comparison OperatorsComparison Operators
ComparisonComparison StringString NumberNumber
EqualityEquality eqeq ====
InequalityInequality nene !=!=
Greater thanGreater than gtgt >>
Greater than or Greater than or equal toequal to
gege >=>=
Less thanLess than ltlt <<
Less than or equal Less than or equal toto
return 1/nullreturn 1/null
lele <=<=
Comparison:Comparison:
Returns -1, 0, 1Returns -1, 0, 1cmpcmp <=><=>
Logical OperatorsLogical Operators
OperationOperation ComputeresComputeresee
English English versionversion
ANDAND &&&& andand
OROR |||| oror
NOTNOT !! notnot
if/else/elsifif/else/elsif
allows for multiple allows for multiple branching/outcomesbranching/outcomes$a = rand();$a = rand();
ifif ( $a <0.25 ) { ( $a <0.25 ) {print “A”;print “A”;
}}
elsifelsif ($a <0.50 ) { ($a <0.50 ) {print “C”;print “C”;
}}
elsifelsif ( $a < 0.75 ) { ( $a < 0.75 ) {print “G”;print “G”;
}}
elseelse { {print “T”; print “T”;
}}
Conditional LoopsConditional Loops
whilewhile ( ( statement statement ) {) { commands … commands … }} repeats repeats commandscommands until until statementstatement is no is no
longer truelonger true
dodo { { commandscommands } } whilewhile ( ( statementstatement ); ); same as same as whilewhile, except , except commandscommands executed as least executed as least
onceonce NOTENOTE the ‘ the ‘;;’ after the while statement!!’ after the while statement!!
Short-circuiting commands: Short-circuiting commands: nextnext and and lastlast
next;next; #jumps to end, do next iteration#jumps to end, do next iteration last;last; #jumps out of the loop completely #jumps out of the loop completely
whilewhile
Example:Example:
while ($alive) {while ($alive) {
if ($needs_nutrients) {if ($needs_nutrients) {
print “Cell needs nutrients\n”;print “Cell needs nutrients\n”;
}}
}}
Any problem?Any problem?
for and foreach loopsfor and foreach loops Execute a code loop a specified number of Execute a code loop a specified number of
times, or for a specified list of valuestimes, or for a specified list of values forfor and and foreachforeach are identical: use are identical: use
whichever you wantwhichever you want
Incremental loop (“C style”):Incremental loop (“C style”):for ( $i=0 ; $i < 50 ; $i++ ) {for ( $i=0 ; $i < 50 ; $i++ ) {
$x = $i*$i;$x = $i*$i;
print "$i squared is $x.\n";print "$i squared is $x.\n";
}}
Loop over list (“Loop over list (“foreachforeach” loop):” loop): foreach $name ( "Billy", "Bob", "Edwina" ) {foreach $name ( "Billy", "Bob", "Edwina" ) {
print "$name is my friend.\n";print "$name is my friend.\n";
}}
Basic Data TypesBasic Data Types
Perl has Perl has three basic data three basic data typestypes::scalarscalararray (list)array (list)associative array (hash)associative array (hash)