module 4 –python and regular expressionstodd/cse330/cse330_lecture4.pdf · what is python?...

22
1 - CSE 330 – Creative Programming and Rapid Prototyping Module 4 – Python and Regular Expressions Module 4 contains only an individual assignment Due Monday July 6th Do not wait until the last minute to start on this module Read the WIKI before starting along with a few Python tutorials Portions of today’s slides came from Marc Conrad University of Luton Paul Prescod Vancouver Python Users’ Group James Casey • Opscode Tim Finin Univeristy of Maryland 1 2 - CSE 330 – Creative Programming and Rapid Prototyping What is Python? Python is an easy to learn, powerful programming language – Efficient high-level data structures – Simple approach to object-oriented programming. – Elegant syntax and dynamic typing – Up-and-coming language in the open source world We are using Python version 3.4 or later in this course 2

Upload: others

Post on 04-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 11 - CSE 330 – Creative Programming and Rapid Prototyping

Module 4 – Python and Regular Expressions

• Module 4 contains only an individual assignment

• Due Monday July 6th

• Do not wait until the last minute to start on this module

• Read the WIKI before starting along with a few Python tutorials

• Portions of today’s slides came from– Marc Conrad

• University of Luton– Paul Prescod

• Vancouver Python Users’ Group– James Casey

• Opscode– Tim Finin

• Univeristy of Maryland

1

Extensible Networking Platform 22 - CSE 330 – Creative Programming and Rapid Prototyping

What is Python?

• Python is an easy to learn, powerful programming language– Efficient high-level data structures– Simple approach to object-oriented

programming.– Elegant syntax and dynamic typing– Up-and-coming language in the open source

world

• We are using Python version 3.4 or later in this course

2

Page 2: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 33 - CSE 330 – Creative Programming and Rapid Prototyping

Usability Features

• Very clear syntax• Obvious way to do most things• Huge amount of free code and libraries• Interactive• Only innovative where innovation is really

necessary– Better to steal a good idea than invent a bad one!

3

Extensible Networking Platform 44 - CSE 330 – Creative Programming and Rapid Prototyping

Python “Hello world"

print (“Hello, World”)

4

Page 3: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 55 - CSE 330 – Creative Programming and Rapid Prototyping

Python Interpreter

• Just type:

• Todds-MacBook-Air:~ todd$ python3• Python 3.6.1 (default, Apr 4 2017, 09:40:21)• [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)]

on darwin• Type "help", "copyright", "credits" or "license" for more

information.

5

Extensible Networking Platform 66 - CSE 330 – Creative Programming and Rapid Prototyping

Features of the Interpreter

• Lines start with “>>>”. You can recognize Python interpreter transcripts anywhere you see them.

• Expressions that return a value display the value.>>> 5+3*417

• This saves you from excessive “print”ing

6

Page 4: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 77 - CSE 330 – Creative Programming and Rapid Prototyping

Interactive Interpreters

• Windows command line• OS X• Linux/Unix • Graphical command lines: “IDLE”, “PythonWin”, “MacPython”, …

• Jython• And many more…

7

Extensible Networking Platform 88 - CSE 330 – Creative Programming and Rapid Prototyping

Python scripts

• Sometimes you want to run the same program more than once!

• Make a file with Python statements in it:foo.py:print (“hello world”)

todd$ python3 foo.pyhello worldtodd$ python3 foo.pyhello world

8

Page 5: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 99 - CSE 330 – Creative Programming and Rapid Prototyping

Python is dynamically typed

width = 20print (width) height = 5 * 9print (height)print (width * height) width = "really wide"print (width)

9

Extensible Networking Platform 1010 - CSE 330 – Creative Programming and Rapid Prototyping

Experiment in the Interpreter

• Any Python variable can hold any value.

>>> width = 20>>> height = 5 * 9>>> print (width * height)900>>> width = "really wide">>> print (width)really wide

10

Page 6: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 1111 - CSE 330 – Creative Programming and Rapid Prototyping

Dynamic Type Checking

test_sqrt.py:import math

def square_root(num):return math.sqrt(num)

def goodfunc():print (square_root(10))

def badfunc():print (square_root("10"))

goodfunc()badfunc()

11

Extensible Networking Platform 1212 - CSE 330 – Creative Programming and Rapid Prototyping

Multiple statements on a line

• You can combine multiple simple statements on a line:

>>> a = 5;print (a); a = 6; print (a)

5

6

12

Page 7: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 1313 - CSE 330 – Creative Programming and Rapid Prototyping

Indentation

• Python uses indentation for scoping:

if this_function(that_variable):

do_something()

else:

do_something_else()

13

Extensible Networking Platform 1414 - CSE 330 – Creative Programming and Rapid Prototyping

Indentation

• Tabs and spaces look the same in most editors.

• If your editor uses a different conversion rate between tabs and spaces than “standard”, your Python code may not parse properly.

• Three easy solutions:1. Only use tabs or spaces in a file: don’t mix them.2. Use an editor that knows about Python.3. Configure editor to use the same tab/space rules as Python, vi, emacs,

notepad, edit, etc. : 8 spaces per tab

14

Page 8: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 1515 - CSE 330 – Creative Programming and Rapid Prototyping

Compared to PHP/Javascript

• Excellent for Web apps (PHP on server, Javascripton client) but not much else.

• Python can be used for your Web apps, your complicated algorithms, your GUIs, your COM components, an extension language for Java programs

• Even in Web apps, Python handles complexity better.

15

Extensible Networking Platform 1616 - CSE 330 – Creative Programming and Rapid Prototyping

Compared to Java

• Java is more difficult for amateur programmers.

• Static type checking can be inconvenient and inflexible.

• Bottom line: Java can make projects harder than they need to be.

16

Page 9: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 1717 - CSE 330 – Creative Programming and Rapid Prototyping

Python Limitations

• Not the fastest executing programming language:– C/C++ is naturally fast– Perl’s regular expressions and IO are a little faster– Some Java implementations have good JITs– But Python also has some speed advantages:

• Fast implementations of built-in data structures• Pyrex compiles Python code to C

• Dynamic type checking requires more care in testing.

• Language changes (relatively) quickly: this is a strength and a weakness.

17

Extensible Networking Platform 1818 - CSE 330 – Creative Programming and Rapid Prototyping

Objects All the Way Down

• Everything in Python is an object• Integers are objects.• Characters are objects.• Complex numbers are objects.• Booleans are objects.• Functions are objects.• Methods are objects.• Modules are objects

18

Page 10: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 1919 - CSE 330 – Creative Programming and Rapid Prototyping

Object Type and Identity

• You can find out the type of any object:>>> print (type(1))<type 'int'>>>> print (type(1.0))<type 'float'>

• Every object also has a unique identifier (usually only for debugging purposes)>>> print (id(1))7629640>>> print (id("1"))7910560

19

Extensible Networking Platform 2020 - CSE 330 – Creative Programming and Rapid Prototyping

None

• “None” represents the lack of a value.• Like “NULL” in some languages or in databases.• For instance:

>>> if y!=0:... fraction = x/y... else:... fraction = None

20

Page 11: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 2121 - CSE 330 – Creative Programming and Rapid Prototyping

File Objects

• Represent opened files:>>> infile = open( "catalog.txt", "r" )>>> data = infile.read()>>> infile.close()>>> outfile = open( "catalog2.txt", "w" )>>> data = data+ "more data">>> outfile.write( data )>>> outfile.close()

• You may sometimes see the name “open” used to create files.

21

Extensible Networking Platform 2222 - CSE 330 – Creative Programming and Rapid Prototyping

Basic Flow Control

• if/elif/else (test condition)

• while (loop until condition changes)

• for (iterate over iteraterable object)

22

Page 12: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 2323 - CSE 330 – Creative Programming and Rapid Prototyping

if Statement

if j=="Hello":doSomething()

elif j=="World":doSomethingElse()

else:doTheRightThing()

23

Extensible Networking Platform 2424 - CSE 330 – Creative Programming and Rapid Prototyping

while Statement

str=""while str!="quit":

str=raw_input()print (str)

print "Done"

24

Page 13: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 2525 - CSE 330 – Creative Programming and Rapid Prototyping

for Statement

myList = ["a", "b", "c", "d", "e"]for i in myList:

print (i)

for i in range( 10 ):print (i)

for i in range( len( myList ) ):if myList[i]=="c":

myList[i]=None

• Can “break” out of for-loops.• Can “continue” to next iteration.

25

Extensible Networking Platform 2626 - CSE 330 – Creative Programming and Rapid Prototyping

Python Modules

26

Page 14: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 2727 - CSE 330 – Creative Programming and Rapid Prototyping

What is a Module?

- A file containing some Python code

OR

- A .dll (.so on Unix) containing compiled code which follows some guidelines

- A namespace

27

Extensible Networking Platform 2828 - CSE 330 – Creative Programming and Rapid Prototyping

A Python Module

def hello_world():print (“Hello world”)

• Save this as “myModule.py” Now we can use it:>>> import myModule>>> myModule.hello_world()

• Or:>>> from myModule import hello_world>>> hello_world()

28

Page 15: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 2929 - CSE 330 – Creative Programming and Rapid Prototyping

Other Built-in Protocols

• FTP• XML-RPC• Telnet• POP• IMAP• MIME• NNTP• HTTP

• SSL• Sockets• CGI• Gopher• URL Parsing

• Plus downloadable modules for every other protocol in the universe!

29

Extensible Networking Platform 3030 - CSE 330 – Creative Programming and Rapid Prototyping

Regular Expressions

30

Page 16: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 3131 - CSE 330 – Creative Programming and Rapid Prototyping

Regular Expressions

• Regular expressions are a powerful string manipulation tool

• All modern languages have similar library packages for regular expressions

• Use regular expressions to:– Search a string (search and match)– Replace parts of a string (sub)– Break strings into smaller pieces (split)

31

Extensible Networking Platform 3232 - CSE 330 – Creative Programming and Rapid Prototyping

Regular Expression Syntax

• Most characters match themselvesThe regular expression “test”matches the string ‘test’, and only that string

• [x] matches any one of a list of characters“[abc]”matches ‘a’,‘b’,or ‘c’

• [^x] matches any one character that is not included in x“[^abc]”matches any single character except‘a’,’b’,or ‘c’

32

Page 17: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 3333 - CSE 330 – Creative Programming and Rapid Prototyping

Regular Expression Syntax

• “.”matches any single character

• Parentheses can be used for grouping“(abc)+”matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc.

• x|y matches x or y“this|that”matches ‘this’ and ‘that’, but not ‘thisthat’.

33

Extensible Networking Platform 3434 - CSE 330 – Creative Programming and Rapid Prototyping

Regular Expression Syntax

• x* matches zero or more x’s“a*”matches ’’, ’a’, ’aa’, etc.

• x+ matches one or more x’s“a+”matches ’a’,’aa’,’aaa’, etc.

• x? matches zero or one x’s“a?”matches ’’ or ’a’

• x{m, n} matches i x‘s, where m<i< n“a{2,3}”matches ’aa’ or ’aaa’

34

Page 18: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 3535 - CSE 330 – Creative Programming and Rapid Prototyping

Regular Expression Syntax

• “\d”matches any digit; “\D” any non-digit

• “\s”matches any whitespace character; “\S” any non-whitespace character

• “\w”matches any alphanumeric character; “\W” any non-alphanumeric character

• “^”matches the beginning of the string;“$” the end of the string

35

Extensible Networking Platform 3636 - CSE 330 – Creative Programming and Rapid Prototyping

Debuggex Example

36

Page 19: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 3737 - CSE 330 – Creative Programming and Rapid Prototyping

Search and Match in Python RegEx

• The two basic functions are re.search and re.match– Search looks for a pattern anywhere in a string– Match looks for a match starting at the beginning

• Both return None (logical false) if the pattern isn’t found and a “match object” instance if it is>>> import re>>> pat = "a*b”>>> re.search(pat,"fooaaabcde")<_sre.SRE_Match object at 0x809c0>>>> re.match(pat,"fooaaabcde")>>>

37

Extensible Networking Platform 3838 - CSE 330 – Creative Programming and Rapid Prototyping

What’s a match object?

• An instance of the match class with the details of the match result

>>> r1 = re.search("a*b","fooaaabcde")>>> r1.group() # group returns string matched

'aaab'>>> r1.start() # index of the match start

3>>> r1.end() # index of the match end7>>> r1.span() # tuple of (start, end)(3, 7)

38

Page 20: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 3939 - CSE 330 – Creative Programming and Rapid Prototyping

What got matched?

• Here’s a pattern to match simple email addresses\w+@(\w+\.)+(com|org|net|edu)

>>> pat1 = "\w+@(\w+\.)+(com|org|net|edu)">>> r1 = re.match(pat1,"[email protected]")>>> r1.group()’[email protected]

• We might want to extract the pattern parts, like the email name and host

39

Extensible Networking Platform 4040 - CSE 330 – Creative Programming and Rapid Prototyping

What got matched?

• We can put parentheses around groups we want to be able to reference

>>> pat2 = "(\w+)@((\w+\.)+(com|org|net|edu))">>> r2 = re.match(pat2,”[email protected]")>>> r2.group(1)’todd'>>> r2.group(2)’arl.wustl.edu'>>> r2.groups()r2.groups()(’todd', ’arl.wustl.edu', ’wustl.', 'edu’)

• Note that the ‘groups’ are numbered in a preorder traversal

40

Page 21: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 4141 - CSE 330 – Creative Programming and Rapid Prototyping

What got matched?

• We can ‘label’ the groups as well… >>> pat3 ="(?P<name>\w+)@(?P<host>(\w+\.)+(com|org|net|edu))"

>>> r3 = re.match(pat3,"[email protected]")>>> r3.group('name')’todd'>>> r3.group('host')’arl.wustl.edu’

• And reference the matching parts by the labels

41

Extensible Networking Platform 4242 - CSE 330 – Creative Programming and Rapid Prototyping

More re functions

• re.split() is like split but can use patterns>>> re.split("\W+", “This... is a test,

short and sweet, of split().”)['This', 'is', 'a', 'test', 'short’,

'and', 'sweet', 'of', 'split’, ‘’]

• re.sub substitutes one string for a pattern>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')

'black socks and black shoes’

• re.findall() finds all matches>>> re.findall("\d+”,"12 dogs,11 cats, 1 egg")['12', '11', ’1’]

42

Page 22: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data

Extensible Networking Platform 4343 - CSE 330 – Creative Programming and Rapid Prototyping

Compiling regular expressions• If you plan to use a re pattern more than once, compile it

to a re object• Python produces a special data structure that speeds up

matching>>> cpat3 = re.compile(pat3)>>> cpat3<_sre.SRE_Pattern object at 0x2d9c0>>>> r3 = cpat3.search("[email protected]")

>>> r3<_sre.SRE_Match object at 0x895a0>>>> r3.group()’[email protected]'

43