binf 634 fall 2015 lect 51 binf634 lecture 5 program 1 solution quiz 2 solution program 2...
TRANSCRIPT
![Page 1: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/1.jpg)
BINF 634 Fall 2015 Lect 5 1
BINF634 Lecture 5 Program 1 Solution
Quiz 2 Solution
Program 2 Discussions
Regular Expressions
Regular Expressions Lab
Time to Work on Program 2
Outline
![Page 2: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/2.jpg)
Program 1 Discussions You must test all of your code on binf
I am testing your code on binf I can’t possibly know what configuration
of what machine that your code runs under
The perl on binf must be the first line in your program #!/usr/bin/perl
BINF 634 Fall 2015 Lect 5 2
![Page 3: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/3.jpg)
BINF 634 Fall 2015 Lect 5 3
Program 1 Solution#!/usr/bin/perluse strict;use warnings;
# File: cpg.pl# Author: Jeff Solka# Date: 01 Aug 2015## Purpose: Read sequences from a FASTA
format file# Programming Assignment #1
# the argument list should contain the file name
die "usage: fasta.pl filename\n" if scalar @ARGV < 1;
# get the filename from the argument listmy ($filename) = @ARGV;
# Open the file given as the first argument on the command line
open(INFILE, $filename) or die "Can't open $filename\n";
# variable declarations:my @header = (); # array of headersmy @sequence = (); # array of
sequencesmy $count = 0; # number of
sequences
# read FASTA filemy $n = -1; # index of current
sequencewhile (my $line = <INFILE>) { chomp $line; # remove training \
n from line if ($line =~ /^>/) { # line
starts with a ">"$n++; # this starts a new header$header[$n] = $line; # save header
line$sequence[$n] = ""; # start a new
(empty) sequence }
Program 1 Solution
![Page 4: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/4.jpg)
Program 1 Solution (cont.) else {next if not @header; # ignore data before first
header$sequence[$n] .= $line # append to end of
current sequence }}$count = $n+1; # set count to the number of
sequencesclose INFILE;
# remove white space from all sequencesfor (my $i = 0; $i < $count; $i++) { $sequence[$i] =~ s/\s//g;}########## Sequence processing starts here:##### REST OF PROGRAM
my $maxlength = 0;my $minlength = 1E99;my $sumlength = 0;my $avlength = 0;
# process the sequencesfor (my $i = 0; $i < $count; $i++) { $sumlength += length($sequence[$i]); if(length($sequence[$i]) > $maxlength){ $maxlength = length($sequence[$i]); } if(length($sequence[$i]) < $minlength){ $minlength = length($sequence[$i]); }}
$avlength = $sumlength/$count;
# print out statisticsprint "Report for file $filename \n";print "There are $count sequences in the
file \n";print "Total sequence length = $sumlength \
n";print "Maximum sequence length = $maxlength
\n";print "Minimum sequence length = $minlength
\n";print "Ave sequence length = $avlength \n";
Program 1 Solution
4BINF 634 Fall 2015 Lect 5
![Page 5: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/5.jpg)
BINF 634 Fall 2015 Lect 5 5
Program 1 Solution (cont.)# print out sequence informationfor (my $i = 0; $i < $count; $i++) { print "$header[$i]\n"; print
"Length:",length($sequence[$i]),"\n";
# Notice that we can use scalar
variables to hold numbers.my $a = 0; my $c = 0; my $g = 0; my $t = 0;my $cg = 0;
# Use a regular expression "trick", and five while loops,
# to find the counts of the four bases plus errors
while($sequence[$i] =~ /a/ig){$a++} while($sequence[$i] =~ /c/ig){$c++} while($sequence[$i] =~ /g/ig){$g++} while($sequence[$i] =~ /t/ig){$t++} while($sequence[$i] =~ /cg/ig){$cg++}
printf "A:%d %0.2f \n", $a, $a/length($sequence[$i]);
printf "C:%d %0.2f \n", $c, $c/length($sequence[$i]);
printf "G:%d %0.2f \n", $g, $g/length($sequence[$i]);
printf "T:%d %0.2f \n", $t, $t/length($sequence[$i]);
printf "CpG:%d %0.2f \n", $cg, $cg/length($sequence[$i]);
}
exit;
Program 1 Solution
![Page 6: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/6.jpg)
BINF 634 Fall 2015 Lect 5 6
Quiz 2 Solution#!/usr/bin/perl -w
use strict;
use warnings;
#quiz2 Fall 2015
#Jeff Solka
my(@a)=(1..10);
print "array a prior to the function call \n";
print "@a \n";
myfun(\@a);
print "array a after the function call \n";
print "@a \n";
exit;
sub myfun{
my($i)=@_;
my $element;
foreach $element(@$i) {
$element = $element**2;
}
}
Program 1 Solution
![Page 7: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/7.jpg)
BINF 634 Fall 2015 Lect 5 7
Quiz 2 Program in Action
Program in action.
[binf:binf634/quizzs/myquizzes] jsolka% ./quiz2.pl
array a prior to the function call
1 2 3 4 5 6 7 8 9 10
array a after the function call
1 4 9 16 25 36 49 64 81 100
Quiz 2 Solution
![Page 8: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/8.jpg)
ANY QUESTIONS ON PROGRAM 2?
BINF 634 Fall 2015 Lect 5 8
Program 2 Discussions
![Page 9: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/9.jpg)
Regular Expression Humor
A relevant cartoon
BINF 634 Fall 2015 Lect 5 9
Regular Expression (Humor)
![Page 10: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/10.jpg)
BINF 634 Fall 2015 Lect 5 10
Regular Expressions Bioinformatics programs often have to look for patterns in strings:
Find a DNA sequences containing only C's and G's Look for a sequence that begins with ATG and ends with TAG
Regular expressions are a way of describing a PATTERN: "all the words that begin with the letter A" "every 10-digit phone number“
We create regular expression to match the different parts of the pattern we're looking for
Ordinary characters match themselves Meta-characters are special symbols that match a group of characters
for example \d matches any digit
Regular Expression (Why?)
![Page 11: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/11.jpg)
Meta Characters(see Camel Book, Ch. 5)
. match any single character
[atcg] match any single a, t, c, or g
[A-Z] match any character in given range
[^atcg] match any character NOT in the set
\CHAR takes away meta meaning of character CHAR[\.\|\*] matches "." or "|" or "*"
^ or \A true at start of string
$ or \z true at end of string
\b\B
true at word boundarytrue when not at word boundary
\d\D
match any digitmatch any non-digit
\n\t
match newline charactermatch tab character
\s\S
match any white space charactermatch any non-whitespace character
\w\W
match any "word" character (alphanumeric plus "_")match any non-word character
Regular Expression (How?)
11BINF 634 Fall 2015 Lect 5
![Page 12: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/12.jpg)
Ways to Control Patterns(see Camel Book, Ch. 5)
PATTERN1|PATTERN2 matches either PATTER1 or PATTERN2
PATTERN* matches zero or more instances of pattern. [A-Z]* = any number of capital letters (including 0)
PATTERN+ matches one or more instances of pattern. [A-Z]+ = one or more capital letters
PATTERN{N} matches exactly N instances of pattern[ATCG]{3} = one codon
PATTERN{MIN,MAX}
PATTERN{MIN,}
matches at least MIN but not more than MAX timesA[C]{2,4}G matches ACCG, ACCCG, or ACCCCGmatches at least MIN times
*?+?{MIN,MAX}?
matches 0 or more time, minimallymatches 1 or more time, minimallymatches MIN to MAX times, minimally
Regular Expression (How?)
12BINF 634 Fall 2015 Lect 5
![Page 13: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/13.jpg)
Examples
# match if string $str contains 0 or more white space characters
$str =~ /^\s*$/;
# string $str contains all capital letters (at least one)
$str =~ /^[A-Z]+$/;
# string $str contains a capital letter followed by 0 or more digits
$str =~ /[A-Z]\d*/;
# number $n contains some digits before and after a decimal point
$n =~ /^\d+\.\d+$/;
# string contains A and B separated by any two characters
$s =~ /A..B/;
# string does NOT contains ATG
$s !~ /ATG/;
Regular Expression (Practice)
13BINF 634 Fall 2015 Lect 5
![Page 14: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/14.jpg)
Examples
# match if string $str contains any sequence of three consecutive A's
$str =~ /AAA/;
$str =~ /A{3}/;
# match if string $str consist of exactly three A's
$str =~ /^AAA$/;
$str =~ /^A{3}$/;
# match if $str contains a codon for Alanine (GCA, GCT, GCC, GCG)
$str =~ /GC./;
# match if $str contains a STOP codon (TAA, TAG, TGA)
$str =~ /TA[AG]|TGA/;
$str =~ /T(AA|AG|GA)/;
$str =~ /T(A[AG]|GA)/;
Regular Expression (Practice)
14BINF 634 Fall 2015 Lect 5
![Page 15: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/15.jpg)
Examples
# string contains any word containing all capital letters
$str =~ /\b[A-Z]+\b/;
# A followed by any number of C or G's followed by T or A
$str =~ /A[CG]*(T|A)/;
$str =~ /A[CG]{0,}[TA]/;
# TT followed by one or more CA's followed by anything except G
$str =~ /TT(CA)+[^G]/;
# string begins with B and has between 5 and 10 letters
$str =~ /^B.{4,9}$/;
# string consists of a 10 digit phone number: ddd-ddd-dddd$str =~ /^\d\d\d\-\d\d\d\-\d\d\d\d$/; $str =~ /^\d{3}\-\d{3}\-\d{4}$/;
Regular Expression (Practice)
15BINF 634 Fall 2015 Lect 5
![Page 16: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/16.jpg)
BINF 634 Fall 2015 Lect 5 16
Capturing Matches When we match a string with a regular expression, we may want to find
out what matched Do this by surrounding the part of interest with ( ) Then access special variables $1, $2, etc to get matches:
$str = "Perl is a programming language used for bioinformatics.";
$str =~ /(.*) is.*(b.*)\./;
$first = $1;
$second = $2;
print "$first $second\n"; # prints "Perl bioinformatics"
# or, you can capture the results in a list assignment:
($first, $second) = $str =~ /(.*) is.*(b.*)\./;
print "$first $second\n"; # prints "Perl bioinformatics"
Regular Expression (What Did We Match?)
![Page 17: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/17.jpg)
BINF 634 Fall 2015 Lect 5 17
Capturing Matches When we match a string with a regular expression, we may want to find out
what matched Do this by surrounding the part of interest with ( ) Then access special variables $1, $2, etc to get matches:
$str = "Perl is a programming language used for bioinformatics.";
$str =~ /(P.*l)/;
$word = $1;
print $word; # prints "Perl is a programming l"
$str =~ /(P.*?l)/;
$word = $1;
print $word; # prints "Perl"
$str =~ /\b(u.*?)\b/;
$word = $1;
print $word; # prints "used"
Regular Expression (What Did We Match?)
![Page 18: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/18.jpg)
BINF 634 Fall 2015 Lect 5 18
Capturing Matches If no string is given to the match operators, $_ is assumed
@A = qw / ATGGCT CCCCGGTAT GCAGTGG /;
for (@A) {
($first, $second) = /(.+)GG(.+)/;
print "$first $second\n" if ($first and $second);
}
OUTPUT:
AT CT
CCCC TAT
Q. Why no output for third string?
Regular Expression (What Did We Match?)
![Page 19: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/19.jpg)
#!/usr/bin/perluse strict;use warnings;
my $string = "Several rapidly developing RNA interference (RNAi)methodologies hold the promise to selectively inhibit gene expression inmammals. RNAi is an innate cellular process activated when adouble-stranded RNA (dsRNA) molecule of greater than 19 duplexnucleotides enters the cell, causing the degradation of not only theinvading dsRNA molecule, but also single-stranded (ssRNAs) RNAs ofidentical sequences, including endogenous mRNAs.";
# find all words containing "RNA"while ( $string =~ /(\w*RNA\w*)/g ) { print "$1\n";}exit;
Output:RNARNAiRNAiRNAdsRNAdsRNAssRNAsRNAsmRNAs
Regular Expression (What Did We Match?)
19BINF 634 Fall 2015 Lect 5
![Page 20: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/20.jpg)
#!/usr/bin/perluse strict;use warnings;
my $string = "Several rapidly developing RNA interference (RNAi)methodologies hold the promise to selectively inhibit gene expression inmammals. RNAi is an innate cellular process activated when adouble-stranded RNA (dsRNA) molecule of greater than 19 duplexnucleotides enters the cell, causing the degradation of not only theinvading dsRNA molecule, but also single-stranded (ssRNAs) RNAs ofidentical sequences, including endogenous mRNAs.";
# find all words containing "RNA"while ( $string =~ /(\w+RNA\w+)/g ) { print "$1\n";}exit;
Output:ssRNAsmRNAs
Regular Expression (What Did We Match?)
20BINF 634 Fall 2015 Lect 5
![Page 21: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/21.jpg)
#!/usr/bin/perluse strict;use warnings;
my $string = "Several rapidly developing RNA interference (RNAi)methodologies hold the promise to selectively inhibit gene expression inmammals. RNAi is an innate cellular process activated when adouble-stranded RNA (dsRNA) molecule of greater than 19 duplexnucleotides enters the cell, causing the degradation of not only theinvading dsRNA molecule, but also single-stranded (ssRNAs) RNAs ofidentical sequences, including endogenous mRNAs.";
# find all words containing "RNA"while ( $string =~ /(\S+RNA\S+)/g ) { print "$1\n";}exit;
Output:(RNAi)(dsRNA)(ssRNAs)mRNAs.
Regular Expression (What Did We Match?)
21BINF 634 Fall 2015 Lect 5
![Page 22: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/22.jpg)
BINF 634 Fall 2015 Lect 5 22
Capturing MatchesWhen we match a string with a regular expression, several special variables
get set automatically:
$string =~ /REGEXP/;$` = part of string to the left of the match$& = part of string matched by the regular expression REGEXP$’ = part of string the the right the match
$string = "ATCGCAT";$string =~ /T.G/;print "left part: $` \n";print "match: $& \n";print "right part: $’ \n";
Output:left part: Amatch: TCGright part: CAT
Regular Expression (What Did We Match?)
![Page 23: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/23.jpg)
BINF 634 Fall 2015 Lect 5 23
A Nice Application of Capturing Matches#!/usr/bin/perlprint ("\nEnter string or cntl-D to quit\n");print ("Square brackets indicate text that matched pattern\n\n");$prompt = "test> ";print $prompt;while(<STDIN>) {chomp;if(/REGEXP Goes Here/) {print("$`\[$&]$'\n");}else {print("no match\n");}print $prompt;}exit;
Regular Expression (A Regular Expression Tester)
![Page 24: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/24.jpg)
An Even Nicer Implementation of This Idea - I
#!/usr/bin/perl
use strict;
use warnings;
# File: regex_tester.pl
# Author: Jim Logan
#
# Fully interactive version (i.e., no recompiles required) a regular expression
# tester based on a script by Fernando J. Pineda as presented to
# class of BINF623 by Jeff Solka on 10/1/12.
# Particularly useful in an Eclipse environment using its cut and paste facility.
# instructions for use
print "\nAccepts keyboard entry of a regular expression and then permits\n";
print "successive entry of strings to test that expression.\n";
print "Square brackets in output indicate the text that matched pattern\n\n";
print "Note: Depending upon the environment (e.g. Eclipse), you may be\n";
print "able to cut and paste into both the \"Next expression\" and the\n";
print "\"New test string\" fields and then edit as desired.\n";
BINF 634 Fall 2015 Lect 5 24
Regular Expression (A Nicer Regular Expression Tester)
![Page 25: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/25.jpg)
An Even Nicer Implementation of This Idea - II
# initialization
my $regex = '/^.*$/'; #default regex to start and to demonstrate
my $string = 'This is a test string';
my $input = "";
my $stripped_regex = "";
while (1) { # outer loop to sequence regular expressions
print "\nCurrent regular expresssion: $regex\n";
print "Enter a new expression to change or ENTER to continue without change.\n";
print "(\"quit\" terminates the program)\n";
print "New expression: ";
$input = <STDIN>;
chomp $input;
if ($input =~ /^q.*$/i) {exit};
if ($input !~ /^$/) {
$regex = $input;
}
$stripped_regex = substr ($regex, 1, length ($regex) -2);
BINF 634 Fall 2015 Lect 5 25
Regular Expression (A Nicer Regular Expression Tester)
![Page 26: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/26.jpg)
An Even Nicer Implementation of This Idea - III
# User includes the two slashes for a regular expresssion
# but they are stripped here so that variable is just the pattern
# that will be interpolated in /pattern/ context.
while (1) { # inner loop to sequence strings to test the expression
print "\nCurrent test string: $string\n";
print "Enter a new expression to change or ENTER to reset the regex.\n";
print "New test string: ";
$input = <STDIN>;
chomp $input;
if ($input =~ /^$/) { # for blank line, go back to set expresssion
last; }
else {
$string = $input; # else run regex over input
}
BINF 634 Fall 2015 Lect 5 26
Regular Expression (A Nicer Regular Expression Tester)
![Page 27: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/27.jpg)
An Even Nicer Implementation of This Idea - IV
if( $string =~ /$stripped_regex/) {
print("$`\[$&]$'\n"); } # show match in context of input
else {
print("no match\n");
}
}
}
exit;
BINF 634 Fall 2015 Lect 5 27
Regular Expression (A Nicer Regular Expression Tester)
![Page 28: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/28.jpg)
BINF 634 Fall 2015 Lect 5 28
Finding the position of matches
If we use the global modifier g, then pos($string) returns position after the match:
$string = "ATCGCATGGAA";
$string =~ /T.G/g;
print "$& ends at position ", pos($string)-1, "\n\";
$string =~ /T.G/g;
print "$& ends at position ", pos($string)-1, "\n";
Output:
TCG ends at position 3
TGG ends at position 8
Regular Expression (Where Did the Match Occur?)
![Page 29: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/29.jpg)
#!/usr/bin/perluse strict;use warnings;
my $string = "Several rapidly developing RNA interference (RNAi)methodologies hold the promise to selectively inhibit gene expression inmammals. RNAi is an innate cellular process activated when adouble-stranded RNA (dsRNA) molecule of greater than 19 duplexnucleotides enters the cell, causing the degradation of not only theinvading dsRNA molecule, but also single-stranded (ssRNAs) RNAs ofidentical sequences, including endogenous mRNAs.";
# find all words containing "RNA"while ( $string =~ /(\S+RNA\S+)/g ) { print "$1 ends at position ", pos($string)-1, "\n";}exit;
Output:(RNAi) ends at position 49(dsRNA) ends at position 211(ssRNAs) ends at position 374mRNAs. ends at position 431
Regular Expression (Where Did the Match Occur?)
29BINF 634 Fall 2015 Lect 5
![Page 30: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/30.jpg)
BINF 634 Fall 2015 Lect 5 30
Some Useful URLs http://docs.python.org/library/re.html http://www.regular-expressions.info/ http://www.regular-expressions.info/tutorial.html http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
Additional Reading
![Page 31: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/31.jpg)
BINF 634 Fall 2015 Lect 5 31
Homework Remember we meet Tuesday of week 10/13/15 at the usual
place and time due to the Columbus day Holiday. Program 2 due Tuesday 10/13/14at 7:00 pm. Quiz 3 will occur next week. Remember that on Tuesday October 19, 2015 we will have our
in class midterm exam. It will be open book and notes.
On the Horizon
![Page 32: BINF 634 Fall 2015 Lect 51 BINF634 Lecture 5 Program 1 Solution Quiz 2 Solution Program 2 Discussions Regular Expressions Regular Expressions Lab Time](https://reader036.vdocuments.net/reader036/viewer/2022062409/5697bf8f1a28abf838c8d214/html5/thumbnails/32.jpg)
BINF 634 Fall 2015 Lect 5 32
Regular Expression Lab Counts as a quiz grade
100 possible points
Our Regular Expression Lab