short perl tutorial instructor: paul t arau , based on rada mihalcea’s original slides

34
Short Perl tutorial Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: some of the material in this slide set was adapted from a Perl course taught at University of Antwerp

Upload: della

Post on 23-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Short Perl tutorial Instructor: Paul T arau , based on Rada Mihalcea’s original slides Note: some of the material in this slide set was adapted from a Perl course taught at University of Antwerp. 1987 Larry Wall Develops PERL 1989 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Short Perl tutorialInstructor: Paul Tarau, based on Rada Mihalcea’s original slides

Note: some of the material in this slide set was adapted from a Perl course taught at University of Antwerp

Page 2: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 2

About Perl

1987Larry Wall Develops PERL

1989October 18 Perl 3.0 is

released under the GNU Protection License

1991March 21 Perl 4.0 is

released under the GPL and the new Perl Artistic License

Now Perl 5.14

PERL is not officially a Programming Language per se.

Wall’s original intent was to develop a scripting language more powerful than Unix Shell Scripting, but not as tedious as C.

PERL is an interpreted language. That means that there is no explicitly separate compilation step.

Rather, the processor reads the whole file, converts it to an internal form and executes it immediately.

P.E.R.L. = Practical Extraction and Report Language

Page 3: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 3

Variables

A variable is a name of a place where some information is stored. For example:

$yearOfBirth = 1976; $currentYear = 2011; $age = $currentYear-$yearOfBirth; print $age;

The variables in the example program can be identified as such because their names start with a dollar ($). Perl uses different prefix characters for structure names in programs. Here is an overview:

• $: variable containing scalar values such as a number or a string • @: variable containing a list with numeric keys • %: variable containing a list with strings as keys • &: subroutine • *: matches all structures with the associated name

Page 4: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 4

Operations on numbers

Perl contains the following arithmetic operators: • +: sum • -: subtraction • *: product • /: division • %: modulo division • **: exponent

Apart from these operators, Perl contains some built-in arithmetic functions. Some of these are mentioned in the following list:

• abs($x): absolute value • int($x): integer part • rand(): random number between 0 and 1 • sqrt($x): square root

Page 5: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 5

Input and output

# age calculator print "Please enter your birth year "; $yearOfBirth = <>; chomp($yearOfBirth); print "Your age is ",2011-$yearOfBirth,".\n";

# count the number of lines in a fileopen INPUTFILE, “<$myfile”;(-r INPUTFILE) || die “Could not open the file $myfile\n”;$count = 0;while($line = <INPUTFILE>) { $count++;}print “$count lines in file $myfile\n”;

# open for writingopen OUTPUTFILE, “>$myfile”;

Page 6: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 6

Conditional structures

# determine whether number is odd or even print "Enter number: "; $number = <>; chomp($number); if ($number-2*int($number/2) == 0) { print "$number is even\n"; } elsif (abs($number-2*int($number/2)) == 1) { print "$number is odd\n"; } else { print "Something strange has happened!\n"; }

Page 7: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 7

Numeric test operators

An overview of the numeric test operators: • ==: equal • !=: not equal • <: less than • <=: less than or equal to • >: greater than • >=: greater than or equal to

All these operators can be used for comparing two numeric values in an if condition.

Truth expressionsthree logical operators: • and: and (alternative: &&) • or: or (alternative: ||) • not: not (alternative: !)

Page 8: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 8

Iterative structures

#print numbers 1-10 in three different ways $i = 1; while ($i<=10) { print "$i\n"; $i++; } for ($i=1;$i<=10;$i++) { print "$i\n"; } foreach $i (1,2,3,4,5,6,7,8,9,10) { print "$i\n"; }

Stop a loop, or force continuation:last; # C breaknext; # C continue;

Exercise: Read ten numbers and print the largest, the smallest and a count representing how many of them are dividable by three. if (not(defined($largest)) or $number > $largest) { $largest = $number; }if ($number-3*int($number/3) == 0) { $count3++; }

Page 9: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 9

A paranthesisPERL philosophy(ies)

There is more than one way to do it

If you want to shoot yourself in the foot, who am I to stop you?

And a comment: DO write comments in your Perl programs!

Page 10: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 10

Basic string operations

- strings are stored in the same type of variables we used for storing numbers

- string values can be specified between double and single quotes- !!! in the first one variables will be evaluated, in the second one

they will not.

Comparison operators for strings - eq: equal - ne: not equal - lt: less than - le: less than or equal to - gt: greater than - ge: greater than or equal to

Examples:

if ($a eq $b) { ….}

Page 11: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 11

String substitution and string matchingThe power of Perl!

The s/// operator modifies sequences of characters (substitute)The tr/// operator changes individual characters. (translate)The m// operator checks for matching (or in short //) (match) - the first part between the first two slashes contains a search

pattern - the second part between the final two slashes contains the

replacement.- behind the final slash we can put characters to modify the

behavior of the commands.

By default s/// only replaces the first occurrence of the search pattern

- append a g to the operator to replace every occurrence. - append an i to the operator, to have the search case insensitive tr translates the characters in the first set of characters into the

characters of the second set- if the second set is shorter, the last character is multiplied- if the second set is longer, the exceeding characters are truncated

The tr/// operator allows the modification characters - c (replace the complement of the search class)- d (delete characters of the search class that are not replaced) - s (squeeze sequences of identical replaced characters to one

character)

Page 12: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 12

Examples

# replace first occurrence of "bug" $text =~ s/bug/feature/; # replace all occurrences of "bug" $text =~ s/bug/feature/g; # convert to lower case $text =~ tr/[A-Z]/[a-z]/; # delete vowels $text =~ tr/AEIOUaeiou//d; # replace nonnumber sequences with x $text =~ tr/[0-9]/x/cs; # replace all capital characters by CAPS $text =~ s/[A-Z]/CAPS/g; Simple example:Print all lines from a file that include a given sequence of characters[emulate grep behavior]

Page 13: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 13

Regular expressions

• \b: word boundaries • \d: digits • \n: newline • \r: carriage return • \s: white space characters • \t: tab • \w: alphanumeric characters • ^: beginning of string • $: end of string • .: any character • [bdkp]: characters b, d, k and p • [a-f]: characters a to f • [^a-f]: all characters except a to f • abc|def: string abc or string def • [:alpha:],[:punct:],[:digit:], … - use inside character class e.g.,

[[:alpha:]]

• *: zero or more times • +: one or more times • ?: zero or one time • {p,q}: at least p times and at most q times • {p,}: at least p times • {p}: exactly p times

Examples:1. Clean an HTML formatted text

2. Grab URLs from a Web page

3. Transform all lines from a file into lower case

Page 14: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 14

Lists and arrays

@a = (); # empty list @b = (1,2,3); # three numbers @c = ("Jan","Piet","Marie"); # three strings @d = ("Dirk",1.92,46,"20-03-1977"); # a mixed list Variables and sublists are interpolated in a list@b = ($a,$a+1,$a+2); # variable interpolation @c = ("Jan",("Piet","Marie")); # list interpolation @d = ("Dirk",1.92,46,(),"20-03-1977"); # empty list interpolation @e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie") Practical construction operators• ($x..$y) @x = (1..6) # same as (1, 2, 3, 4, 5, 6) @y = (1.2..4.2) # same as (1.2, 2.2, 3.2, 4.2, 5.2) @z = (2..5,8,11..13) # same as (2,3,4,5,8,11,12,13) • qw() ("quote word") function qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie"). • split function

Page 15: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 15

Split function

$string = "Jan Piet\nMarie \tDirk"; @list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = " Jan Piet\nMarie \tDirk\n"; # watch out, empty string at the

begin and end!!! @list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" ) $string = "Jan:Piet;Marie---Dirk"; # use any regular expression... @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) $string = "Jan Piet"; # use an empty regular expression to split on

letters @letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")

Example:

1. Tokenize a text: separate simple punctuation (, . ; ! ? ( ) )

2. Add all the digits in a number

Page 16: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 16

More about arrays

@array = ("an","bert","cindy","dirk"); $length = @array; # $length now has the value 4 @array = ("an","bert","cindy","dirk"); $length = @array; print $length; # prints 4 print $#array; # prints 3 print $array[$#array] # prints "dirk" print scalar(@array) # prints 4

($a, $b) = ("one","two"); ($onething, @manythings) = (1,2,3,4,5,6) # now $onething equals

1 # and # @manythings = (2,3,4,5,6) ($array[0],$array[1]) = ($array[1],$array[0]); # swap the first two

Pay attention to the fact that assignment to a variable first evaluates the right hand-side of the expression, and then makes a copy of the result

@array = ("an","bert","cindy","dirk"); @copyarray = @array; # makes a copy $copyarray[2] = "XXXXX";

Page 17: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 17

Manipulating lists and their elements• push ARRAY LIST appends the list to the end of the array. if the second argument is a scalar rather than a list, it appends it

as the last item of the array. @array = ("an","bert","cindy","dirk"); @brray = ("evelien","frank"); push @array, @brray; # @array is

("an","bert","cindy","dirk","evelien","frank") push @brray, "gerben"; # @brray is ("evelien","frank","gerben")

• pop ARRAY does the opposite of push. it removes the last item of its argument list and returns it. if the list is empty it returns undef.

@array = ("an","bert","cindy","dirk"); $item = pop @array; # $item is "dirk" and @array is

( "an","bert","cindy") • shift ARRAY works on the left end of the list, but is otherwise the

same as pop. • unshift ARRAY LIST puts stuff on the left side of the list, just as

push does for the right side.

Page 18: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 18

Working with lists

Convert lists to strings@array = ("an","bert","cindy","dirk"); print "The array contains $array[0] $array[1] $array[2] $array[3]"; # interpolateprint "The array contains @array"; function join STRING LIST. $string = join ":", @array; # $string now has the value

"an:bert:cindy:dirk" $string = join "+", "", @array; # $string now has the value

"+an+bert+cindy+dirk" Iteration over listsfor( $i=0 ; $i<=$#array; $i++){ $item = $array[$i]; $item =~ tr/a-z/A-Z/; print "$item "; } foreach $item (@array){ $item =~ tr/a-z/A-Z/; print "$item "; # prints a capitalized version of each item }

Page 19: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 19

Grep and map

• grep CONDITION LIST returns a list of all items from list that satisfy some condition.

For example: @large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25) @i_names = grep /i/, @array; # returns ("cindy","dirk") Example: Print all lines from a file that include a given sequence of characters[emulate grep behavior]

• map OPERATION LIST is an extension of grep, and performs an arbitrary operation on

each element of a list.

For example: @more = map $_ + 3, (1,2,4,8,16,25); # returns (4,5,7,11,19,28) @initials = map substr($_,0,1), @array; # returns ("a","b","c","d")

Page 20: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 20

Hashes (Associative Arrays)

- associate keys with values- allows for almost instantaneous lookup of a value that is

associated with some particular key

Existing, Defined and true. - If the value for a key does not exist in the hash, the access to

it returns the undef value. - special test function exists(HASHENTRY), which returns true

if the hash key exists in the hash- if($hash{$key}){...}, or if(defined($hash{$key})){...} return

false if the key $key has no associated value

Page 21: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 21

Hashes (cont’d)

Examples$wordfrequency{"the"} = 12731; # creates key "the", value 12731$phonenumber{"An De Wilde"} = "+31-20-6777871"; $index{$word} = $nwords; $occurrences{$a}++; # if this is the first reference, # the value associated with $a will # be increased from 0 to 1

%birthdays = ("An","25-02-1975","Bert","12-10-1953","Cindy","23-05-1969","Dirk","01-04-1961"); # fill the hash

%birthdays = (An => "25-02-1975", Bert => "12-10-1953", Cindy => "23-05-1969", Dirk => "01-04-1961" ); # fill the hash; the same as above, but more explicit

@list = %birthdays; # make a list of the key/value pairs %copy_of_bdays = %birthdays; # copy a hash

Page 22: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 22

Operations on Hashes

- keys HASH returns a list with only the keys in the hash. As with any list, using it in a scalar context returns the number of keys in that list.

- values HASH returns a list with only the values in the hash, in the same order as the keys returned by keys.

foreach $key (sort keys %hash ){ push @sortedlist, ($key , $hash{$key} ); print "Key $key has value $hash{$key}\n"; }

Page 23: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 23

Operations on Hashes

reverse the direction of the mapping, i.e. construct a hash with keys and values swapped:

%backwards = reverse %forward; (if %forward has two identical values associated with different keys,

those will end up as only a single element in %backwards)- hash slice@birthdays{"An","Bert","Cindy","Dirk"} = ("25-02-1975","12-10-

1953","23-05-1969","01-04-1961"); - each( HASH ) – traverse a hash while (($name,$date) = each(%birthdays)) { print "$name's birthday is $date\n"; } # alternative: foreach $key (keys %birthdays)…

Page 24: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 24

Multidimensional data structures

- Perl does not really have multi-dimensional data structures, but a nice way of emulating them, using references

$matrix[$i][$j] = $x; $lexicon1{"word"}[1] = $partofspeech; $lexicon2{"word"}{"noun"} = $frequency; Array of arrays@matrix = ( # an array of references to anonymous

arrays [1, 2, 3], [4, 5, 6], [7, 8, 9] );

Page 25: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 25

Multidimensional structures

Hash of arrays%lexicon1 = ( # a hash from strings to

anonymous arrays the => [ "Det", 12731 ], man => [ "Noun", 658 ], with => [ "Prep", 3482 ] ); Hash of hashes%lexicon2 = ( # a hash from strings to anonymous

hashes of strings to numbers the => { Det => 12731 }, man => { Noun => 658 , Verb => 12 }, with => { Prep => 3482 } );

Page 26: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 26

Programming Example

A program that reads lines of text, gives a unique index number to each word and counts the word frequencies

#!/usr/local/bin/perl # read all lines in the input $nwords = 0; while(defined($line = <>)){ # cut off leading and trailing whitespace $line =~ s/^\s*//; $line =~ s/\s*$//; # and put the words in an array @words = split /\s+/, $line; if(!@words){ # there are no words? next; } # process each word... while($word = pop @words){ # if it's unknown assign a new index if(!exists($index{$word})){ $index{$word} = $nwords++; } # always update the frequency $frequency{$word}++; } } # now we print the words sorted foreach $word ( sort keys %index ){ print "$word has frequency $frequency{$word} and index $index{$word}\n"; }

Page 27: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 27

A note on sorting

If we would like to have the words sorted by their frequency instead of by alphabet, we need a construct that imposes a different sort order.

sort function can use any sort order that is provided as an expression.

- the usual alphabetical sort order: sort { $a cmp $b } @list; !! $a and $b are placeholders for the two items from the list that are

to be compared. Do not attempt to replace them with other variable names. Using $x and $y instead will not provide the same effect

- a numerical sort order sort { $a <=> $b } @list; - for a reverse sort, change the order of the arguments: sort { $b <=> $a } @list; - sort the keys of a hash by their value instead of by their own

identity, substitute the values for the arguments of sort: sort { $hash{$b} <=> $hash{$a} } ( keys %hash )

Page 28: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 28

Basics about Subroutines

Calls to subroutines can be recognized because subroutine names often start with the special character &.

sub askForInput { print "Please enter something: "; } # function call &askForInput(); Tip: put related subroutines in a file (usually with the extention

.pm = perl module) and include the file with the command require:

# files with subroutines are stored here use lib "C:\PERL\MYLIBS"; # we will use this file require "nlp";

Page 29: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 29

Variables Scope

A variable $a is used both in the subroutine and in the main part program of the program.

$a = 0; print "$a\n"; sub changeA { $a = 1; } print "$a\n"; &changeA(); print "$a\n"; The value of $a is printed three times. Can you guess what values are

printed?- $a is a global variable.

Page 30: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 30

Variables Scope

Hide variables from the rest of the program using “my”.

my $a = 0; print "$a\n"; sub changeA { my $a = 1; } print "$a\n"; &changeA(); print "$a\n"; What values are printed now?

Page 31: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 31

Communication between subroutines and programs

Provide the arguments of the subroutine call:

&doSomething(2,"a",$abc). - Perl converts all arguments to a flat list. This means that

&doSomething((2,"a"),$abc) will result in the same list of arguments as the earlier example.

Access the argument values inside the procedure with the special list @_.

E.g. my($number, $letter, $string) = @_; # reads the parameters from @_

- A tricky problem is passing two or more lists as arguments of a subroutine. &sub(@a,@b) the subroutine receives the two list as one big one and it will be unable to determine where the first ends and where the second starts.

- pass the lists as reference arguments: &sub(\@a,\@b).

Page 32: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 32

- Subroutines also use a list as output. # the return statement from a subroutine return(1,2); # or simply (1,2)

# read the return values from the subroutine ($a,$b) = &subr().

- Read the main program arguments using $ARGC and @ARGV (same as in C)

Page 33: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 33

More about file management

• open(INFILE,"myfile"): reading • open(OUTFILE,">myfile"): writing • open(OUTFILE,">>myfile"): appending • open(INFILE,"someprogram |"): reading from program • open(OUTFILE,"| someprogram"): writing to program • opendir(DIR,"mydirectory"): open directo

Operations on an open file handle• $a = <INFILE>: read a line from INFILE into $a • @a = <INFILE>: read all lines from INFILE into @a • $a = readdir(DIR): read a filename from DIR into $a • @a = readdir(DIR): read all filenames from DIR into @a • read(INFILE,$a,$length): read $length characters from INFILE

into $a • print OUTFILE "text": write some text in OUTFILE

Close files / directories• close(FILE): close a file • closedir(DIR): close a directory

Page 34: Short Perl tutorial Instructor:  Paul  T arau , based on  Rada Mihalcea’s  original slides

Slide 34

Other file management commands

• binmode(HANDLE): change file mode from text to binary • unlink("myfile"): delete file myfile • rename("file1","file2"): change name of file file1 to file2 • mkdir("mydir"): create directory mydir • rmdir("mydir"): delete directory mydir • chdir("mydir"): change the current directory to mydir • system("command"): execute command command • die("message"): exit program with message message • warn("message"): warn user about problem message

Exampleopen(INFILE,"myfile") or die("cannot open myfile!");OtherAbout $_Holds the content of the current variable Examples:while(<INFILE>) # $_ contains the current line readforeach (@array) # $_ contains the current element in @array