perl part 3 1.subroutines 2.pattern matching and regular expressions

36
PERL Part 3 1. Subroutines 2. Pattern matching and regular expressions

Post on 20-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

PERL

Part 3

1. Subroutines

2. Pattern matching and regular expressions

(1) Subroutines

• Subroutines provide a way for programmers to group a set of statements, set them aside, and turn them into mini-programs within a larger program.

• These mini-programs can be executed several times from different places in the overall program

Working with Subroutines

• You can create a subroutine by placing a group of statements into the following format:

sub subroutine_name { set of statements

}

• For example a outputTableRow subroutinesub outputTableRow { print ‘<TR><TD>One</TD><TD>Two</TD></TR>’;

}

• Execute the statements in this subroutine, by preceding the name by an ampersand: &outputTableRow;

Subroutine Example Program

1. #!/usr/bin/perl2. use CGI ':standard';3. print header, start_html( 'Subroutine Example' );4. print 'Here is simple table <TABLE BORDER=1>';5. &outputTableRow;6. &outputTableRow;7. &outputTableRow;8. print '</TABLE>', end_html;

9. sub outputTableRow {10. print '<TR><TD>One</TD><TD>Two</TD></TR>';11.}

Would Output The Following …

(2) Pattern matching and regular expressions

• Use Perl pattern matching and regular expressions to filter input data

• Work with files to enable a program to store and retrieve data

Patterns in String Variables • Many programming problems require matching,

changing, or manipulating patterns in string variables. – An important use is verifying input fields of a form

• helps provide security against accidental or malicious attacks.

• For example, if expecting a form field to provide a telephone number as input, your program needs a way to verify that the input comprises a string of seven digits.

Four Different Constructs

– The match operator enables your program to look for patterns in strings.

– The substitute operator enables your program to change patterns in strings.

– The split function enables your program to split strings into separate variables based on a pattern.

– Regular expressions provide a pattern matching language that can work with these operators and functions to work on string variables.

The Match Operator

• The match operator is used to test if a pattern appears in a string. – It is used with the binding operator (“=~”)

to see whether a variable contains a particular pattern.

if ( $name =~ m/edu/ ) {

set of statements to execute}

These statements execute if 'edu' isANYWHERE in the contents of the stringvariable $name.

Trys to match the patterninside slashes "/". In thiscase the pattern "edu".

The binding operatorindicates toexamine thecontents of$name.

Other Delimiters? • Slash (“/”) is most common match pattern

– Others are possible, For example, both use valid match operator syntax:

– if ( $name =~ m!Dave! ) {

– if ( $name =~ m<Dave> ) {

• The reverse binding operator test if pattern is NOT found:

if ( $color !~ m/blue/ ) {

Substitutes

• Substitutes the first occurrence of the search pattern for the change pattern in the string variable.

• For example, the following changes the first occurrence of t to T:

$name = “tom turtle”;$name =~ s/t/T/;print “Name=$name”;

• The output of this code would be

Name=Tom turtle

Changing All Occurrences

• You can place a g (for global substitution) at the end of the substitution expression to change all occurrences of the target pattern string in the search string. For example,

– $name = “tom turtle”;– $name =~ s/t/T/g;– print “Name=$name”;

• The output of this code would be

– Name= Tom TurTle

Using Translate

• A similar function is called tr (for “translate”). Useful for translating characters from uppercase to lowercase, and vice versa.

– The tr function allows you to specify a range of characters to translate from and a range of characters to translate to. :

$name="smokeY";

$name =~ tr/[a-z]/[A-Z]/;

print "name=$name";

Would output the following

Name=SMOKEY

The Alternation Operator

• Alternation operator looks for alternative strings for matching within a pattern.

– (That is, you use it to indicate that the program should match one pattern OR the other). The following shows a match statement using the alternation operator (left) and some possible matches based on the contents of $address (right); this pattern matches either com or edu.

Parenthesis For Groupings

• You use parentheses within regular expressions to specify groupings. For example, the following matches a $name value of Dave or David.

• Match Statement:

if ( $name =~ /Dav(e|id)/)

{

print “$name came home from school\n”;

}

Example Alternation Operator Match Statement Possible Matching String Values for

$address

if ( $address =~ /com|edu/ ) { “www.mysite.com”, “Welcome to my

site”,

"Time for education”,“www.mysite.edu”

Using Regular Expressions

• regular expressions to enable programs to more completely match patterns.

– They actually make up a small language of special matching operators that can be employed to enhance the Perl string pattern matching.

Special Character Classes

• Perl has a special set of character classes for short hand pattern matching

• For example consider these two statements

if ( $name =~ m/ / ) {

if ($name =~ m/\s/ ) {

Special Character ClassesCharacter Class Meaning

\s Matches a single space. For example, the following matches

“Apple Core”, “Alle y”, and “Here you go”; it does not match

“Alone”: if ( $name =~ m/e\s/ ) {

\S Matches any nonspace, tab, newline, return, or formfeed

character. For example, the following matches “ZT”, “YT”,

and “;T”: if( $part =~ m/\ST/ ) {

Special Character Classes - IICharacter Class Meaning

\w Matches any word character (uppercase or lowercase letters, digits, or the

underscore character). For example, the following matches “Apple”,

“Time”, “Part time”, “time_to_go”, “ Time”, and “1234”; it does not

match “#%^&”: if ( $part =~ m/\w/ ) {

\W Matches any nonword character (not uppercase or lowercase letters,

digits, or the underscore character). For example, the following

matches “A*B” and “A{B”, but not “A**B”, “AB*”, “AB101”,

or “1234”: if ( $part =~ m/A\WB/ ) {

Special Character Classes - IIICharacter Class Meaning

\d Matches any valid numerical digit (that is, any number 0–9). For

example, the following matches “B12abc”, “The B1 product is late”, “I

won bingo with a B9”, and “Product B00121”; it does not match “B 0”,

“Product BX 111”, or “Be late 1”: if ( $part =~ m/B\d/ ) {

\D Matches any non-numerical character (that is any character not a digit 0–

9). For example, the following matches “AB1234”, “Product number

1111”, “Number VG928321212”, “The number_A1234”, and “Product

1212”; it does not match “1212” or “PR12”:

if ( $part =~ m/\D\D\d\d\d\d/) {

Setting Specific Patterns w/ Quantifiers

• Character quantifiers let you look for very specific patterns

• For example, use the dollar sign (“$”) to to match if a string ends with a specified pattern.

if ($Name =~ /Jones$/ ) {

• Matches “John Jones” but not “Jones is here” would not. Also, “The guilty party is Jones” would matches.

Selected Perl Character Quantifiers

Character

Quantifier

Meaning

^ Matches when the following character starts the string. For example,

the following matches “Smith is OK”, “Smithsonian”, and “Smith,

Black”: if ( $name =~ m/^Smith/ ) {

$ Matches when the preceding character ends the string. For example,

the following matches “the end”, “Tend”, and “Time to Bend”:

if ( $part =~ m/end$/ ) {

Selected Perl Character Quantifiers

Character

Quantifier

Meaning

+ Matches one or more occurrences of the preceding character.

For example, the following matches “AB101”, “ABB101”,

and “ABBB101 is the right part”: if ( $part =~ m/^AB+101/ ) {* Matches zero or more occurrences of the preceding character. For

example, the following matches “AB101”, “ABB101”, “A101”, and

“A101 is broke”: if ( $part =~ m/^AB*101/) {

Building Regular Expressions that Work

1. Determine the precise field rules.

2. Get form and form-handling programs working

3. Start with the most specific term possible.

4. Anchor and refine. (Use ^ and $ when possible)– if ( $date =~ m{^\d\d/\d\d/\d\d\d\d$} ) {

Starts with2 digits

2 digitsin middle

Ends with 4 digits

Regular Expression Special Variables

• Perl regexs set several special scalar variables:

– $& will be equal to the first matching text

– $`will be the text before the match, and

– $’ will be the text after the first match.

$name='*****Marty';

if ( $name =~ m/\w/ ) {

print “match at=$& ";

print "B4=$` after=$'";

} else { print "Not match"; }

• Output: match at=M B4=***** after=arty

Drivedate4.pl Example Program

1. #!/usr/bin/perl2. use CGI ':standard';3. print header, start_html('Date Check');4. $date=param('udate');5. if ( $date =~ m{^\d\d/[0-3]\d/2\d\d\d$} ) {6. print 'Valid date=', $date;7. } else {8. print 'Invalid date=', $date;9.}

10. print end_html;

Output ...

A Pattern Matching Example1. #!/usr/bin/perl2. use CGI ':standard';3. print header, start_html('Command Search');4. @PartNums=( 'XX1234', 'XX1892', 'XX9510');5. $com=param('command');6. $prod=param('uprod');7. if ($com eq "ORDER" || $com eq "RETURN") {8. $prod =~ s/xx/XX/g; # switch xx to XX9. if ($prod =~ /XX/ ) {10. foreach $item ( @PartNums ) {11. if ( $item eq $prod ) {12. print "VALIDATED command=$com prodnum=$prod";13. $found = 1;14. }15. }16. if ( $found != 1 ) {17. print br,"Sorry Prod Num=$prod NOT FOUND";18. }19. } else {20. print br, "Sorry that prod num prodnum=$prod looks wrong";21. }22. } else {23. print br, "Invalid command=$com did not receive ORDER or RETURN";24. }

25. print end_html;

Output...

The Split Function • split() breaks a string into different pieces

based on a field separator. 2 arguments:

– a pattern to match (which can contain regular expressions)

– and a string variable to split. (into as many pieces as there are matches for the pattern)

@output = split( /\s+/, $names );

A string variable.

A list variable that will contain resulting

matches.

Regular expressionto match.

split() Example

$line = “Please , pass thepepper”;

@result = split( /\s+/, $line );

• Sets list variable $result with the following: $result[0] = “Please”;$result[1] = “,”$result[2] = “pass”;$result[3] = “thepepper”;

1 or more spaces

Variable to splitResults

into a list

Another split() Example

• Another split() example:

$line = “Baseball, hot dogs, apple pie”;@newline = split( /,/, $line );

print “newline= @newline”;

• These lines will have the following output:

– newline= Baseball hot dogs apple pie

The Split Function • When you know how many matches to expect:

$line = “AA1234:Hammer:122:12”;($partno, $name, $id, $cost) =

split( /:/, $line );print “Part#: $partno; Name: $part; ID: $num;

Cost: $cost”;

• Would output the following:

Part#: AA1234; Name: Hammer;

ID: 122; Cost: 12

Summary– Perl supports a set of operators and functions that are

useful for working with string variables and verifying input data.

• The match operator, the substitute operator, the translate operator, the split function.

– Perl uses regular expressions to to enable a program to look for specific characters (such as numbers, words, or letters) in specific places in any string.

• You can use them to verify form input, thereby providing a first line of defense against accidental or malicious input.

The End