python binder

327
Introduction to Python David M. Beazley http://www.dabeaz.com Edition: Thu May 28 19:33:40 2009 Copyright (C) 2009 David M Beazley All Rights Reserved

Upload: rajarajan-manickam

Post on 04-Mar-2015

550 views

Category:

Documents


21 download

TRANSCRIPT

Page 1: Python Binder

Introduction to PythonDavid M. Beazley

http://www.dabeaz.com

Edition: Thu May 28 19:33:40 2009

Copyright (C) 2009David M Beazley

All Rights Reserved

Page 2: Python Binder
Page 3: Python Binder

Introduction to Python : Table of Contents

Introduction to Python 0. Course Setup 1 1. Introduction 6 2. Working with Data 48 3. Program Organization and Functions 85

Objects and Software Development 4. Modules and Libraries 115 5. Classes 152 6. Inside the Python Object Model 169 7. Documentation, Testing, and Debugging 193

Python Systems Programming 8. Iterators and Generators 212 9. Working with Text 229 10. Binary Data Handling and File I/O 257 11. Working with Processes 272 12. Python Integration Primer 286

Edition: Thu May 28 19:33:40 2009

Page 4: Python Binder
Page 5: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 0-

Course Setup

Section 0

1

Copyright (C) 2008, http://www.dabeaz.com 0-

Required Files

• Where to get Python (if not installed)

2

http://www.python.org

• Exercises for this class

http://www.dabeaz.com/python/pythonclass.zip

1

Page 6: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 0-

Setting up Your Environment• Extract pythonclass.zip on your machine

3

• This folder is where you will do your work

Copyright (C) 2008, http://www.dabeaz.com 0-

Class Exercises

• Exercise descriptions are found in

4

PythonClass/Exercises/index.html

• All exercises have solution code

Look for the link at the bottom!

2

Page 7: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 0-

Class Exercises

• Please follow the filename suggestions

5

This is the filename you should be using

• Using different names will break later exercises

Copyright (C) 2008, http://www.dabeaz.com 0-

General Tips

• We will be writing a lot of programs that access data files in PythonClass/Data

• Make sure you save your programs in the "PythonClass/" directory so that the names of these files are easy to access.

• Some exercises are more difficult than others. Please copy code from the solution and study it if necessary.

6

3

Page 8: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 0-

Using IDLE

• For this class, we will be using IDLE.

• IDLE is an integrated development environment that comes prepackaged with Python

• It's not the most advanced tool, but it works

• Follow the instructions on the next two slides to start it in the correct environment

7

Copyright (C) 2008, http://www.dabeaz.com 0-

Running IDLE (Windows)• Find RunIDLE in the PythonClass/ folder

• Double-click to start the IDLE environment

8

4

Page 9: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 0-

Running IDLE (Mac/Unix)

• Go into the PythonClass/ directory

• Type the following command in a command shell

9

% python RunIDLE.pyw

• Note: Typing 'idle' at the shell might also work.

5

Page 10: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Introduction to Python

Section 1

1

Copyright (C) 2008, http://www.dabeaz.com 1-

Where to Get Python?

2

http://www.python.org

• Downloads

• Documentation and tutorial

• Community Links

• Third party packages

• News and more

6

Page 11: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

What is Python?

• An interpreted high-level programming language.

• Similar to Perl, Ruby, Tcl, and other so-called "scripting languages."

• Created by Guido van Rossum around 1990.

• Named in honor of Monty Python

3

Copyright (C) 2008, http://www.dabeaz.com 1-

Why was Python Created?

4

"My original motivation for creating Python was the perceived need for a higher level language in the Amoeba [Operating Systems] project. I realized that the development of system administration utilities in C was taking too long. Moreover, doing these things in the Bourne shell wouldn't work for a variety of reasons. ... So, there was a need for a language that would bridge the gap between C and the shell."

- Guido van Rossum

7

Page 12: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python Influences

• C (syntax, operators, etc.)

• ABC (syntax, core data types, simplicity)

• Unix ("Do one thing well")

• Shell programming (but not the syntax)

• Lisp, Haskell, and Smalltalk (later features)

5

Copyright (C) 2008, http://www.dabeaz.com 1-

Some Uses of Python

• Text processing/data processing

• Application scripting

• Systems administration/programming

• Internet programming

• Graphical user interfaces

• Testing

• Writing quick "throw-away" code

6

8

Page 13: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python Non-Uses

• Device drivers and low-level systems

• Computer graphics, visualization, and games

• Numerical algorithms/scientific computing

7

Comment : Python is still used in these application domains, but only as a high-level control language. Important computations are actually carried out in C, C++, Fortran, etc. For example, you would not implement matrix-multiplication in Python.

Copyright (C) 2008, http://www.dabeaz.com 1-

Getting Started

• In this section, we will cover the absolute basics of Python programming

• How to start Python

• Python's interactive mode

• Creating and running simple programs

• Basic calculations and file I/O.

8

9

Page 14: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Running Python• Python programs run inside an interpreter

• The interpreter is a simple "console-based" application that normally starts from a command shell (e.g., the Unix shell)

shell % pythonPython 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwinType "help", "copyright", "credits" or "license" >>>

9

• Expert programmers usually have no problem using the interpreter in this way, but it's not so user-friendly for beginners

Copyright (C) 2008, http://www.dabeaz.com 1-

IDLE

• Python includes a simple integrated development called IDLE (which is another Monty Python reference)

• It's not the most sophisticated environment but it's already installed and it works

• Most working Python programmers tend to use something else, but it is fine for this class.

10

10

Page 15: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

IDLE on Windows

• Look for it in the "Start" menu

11

Copyright (C) 2008, http://www.dabeaz.com 1-

IDLE on other Systems• Launch a terminal or command shell

shell % python -m idlelib.idle

12

• Type the following command to launch IDLE

11

Page 16: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

The Python Interpreter

• When you start Python, you get an "interactive" mode where you can experiment

• If you start typing statements, they will run immediately

• No edit/compile/run/debug cycle

• In fact, there is no "compiler"

13

Copyright (C) 2008, http://www.dabeaz.com 1-

Interactive Mode• The interpreter runs a "read-eval" loop

>>> print "hello world"hello world>>> 37*421554>>> for i in range(5):... print i...01234>>>

• Executes simple statements typed in directly

• Very useful for debugging, exploration

14

12

Page 17: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Interactive Mode• Some notes on using the interactive shell

>>> print "hello world"hello world>>> 37*421554>>> for i in range(5):... print i...01234>>>

15

>>> is the interpreter prompt for starting a

new statement

... is the interpreter prompt for continuing a statement (it may be blank in some tools) Enter a blank line to

finish typing and to run

Copyright (C) 2008, http://www.dabeaz.com 1-

Interactive Mode in IDLE• Interactive shell plus added help features

syntax highlights usage information

16

13

Page 18: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Getting Help• help(name) command

• Documentation at http://docs.python.org

17

>>> help(range)Help on built-in function range in module __builtin__:

range(...) range([start,] stop[, step]) -> list of integers Return a list containing an arithmetic progression of integers. range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0. ...>>>

• Type help() with no name for interactive help

Copyright (C) 2008, http://www.dabeaz.com 1-

Exercise 1.1

18

Time: 10 minutes

14

Page 19: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Creating Programs

• Programs are put in .py files

# helloworld.pyprint "hello world"

• Source files are simple text files

• Create with your favorite editor (e.g., emacs)

• Note: May be special editing modes

• Can also edit programs with IDLE or other Python IDE (too many to list)

19

Copyright (C) 2008, http://www.dabeaz.com 1-

Creating Programs• Creating a new program in IDLE

20

15

Page 20: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Creating Programs• Editing a new program in IDLE

21

Copyright (C) 2008, http://www.dabeaz.com 1-

Creating Programs• Saving a new Program in IDLE

22

16

Page 21: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Running Programs (IDLE)• Select "Run Module" (F5)

• Will see output in IDLE shell window

23

Copyright (C) 2008, http://www.dabeaz.com 1-

Running Programs• In production environments, Python may be

run from command line or a script

• Command line (Unix)shell % python helloworld.pyhello worldshell %

• Command shell (Windows)C:\SomeFolder>helloworld.pyhello world

C:\SomeFolder>c:\python25\python helloworld.pyhello world

24

17

Page 22: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

A Sample Program• The Sears Tower Problem

You are given a standard sheet of paper which you fold in half. You then fold that in half and keep folding. How many folds do you have to make for the thickness of the folded paper to be taller than the Sears Tower? A sheet of paper is 0.1mm thick and the Sears Tower is 442 meters tall.

25

Copyright (C) 2008, http://www.dabeaz.com 1-

A Sample Program# sears.py# How many times do you have to fold a piece of paper# for it to be taller than the Sears Tower?

height = 442 # Metersthickness = 0.1*0.001 # Meters (0.1 millimeter)

numfolds = 0while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness

print numfolds, "folds required"print "final thickness is", thickness, "meters"

26

18

Page 23: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

A Sample Program

• Output% python sears.py1 0.00022 0.00043 0.00084 0.00165 0.0032...20 104.857621 209.715222 419.430423 838.860823 folds requiredfinal thickness is 838.8608 meters

27

Copyright (C) 2008, http://www.dabeaz.com 1-

Exercise 1.2

28

Time: 10 minutes

19

Page 24: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

An IDLE Caution• The first rule of using IDLE is that you will

always open files using the "File > Open" or "File > Recent Files" menu options

29

• Do not right click on .py files to open

Copyright (C) 2008, http://www.dabeaz.com 1-

An IDLE Caution• The second rule of using IDLE is that you will

always open files using the "File > Open" or "File > Recent Files" menu options

30

• Ignore this rule at your own peril (IDLE will operate strangely and you'll be frustrated)

20

Page 25: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Statements

• A Python program is a sequence of statements

• Each statement is terminated by a newline

• Statements are executed one after the other until you reach the end of the file.

• When there are no more statements, the program stops

31

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Comments

• Comments are denoted by #

# This is a commentheight = 442 # Meters

32

• Extend to the end of the line

• There are no block comments in Python (e.g., /* ... */).

21

Page 26: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101: Variables

• A variable is just a name for some value

• Variable names follow same rules as C [A-Za-z_][A-Za-z0-9_]*

• You do not declare types (int, float, etc.)

height = 442 # An integerheight = 442.0 # Floating pointheight = "Really tall" # A string

• Differs from C++/Java where variables have a fixed type that must be declared.

33

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101: Keywords

• Python has a basic set of language keywords

• Variables can not have one of these names

• These are mostly C-like and have the same meaning in most cases (later)

34

andassert

break

classcontinue

defdelelif

else

exceptexecfinallyfor

fromglobalif

importinislambdanotorpassprint

raisereturn

trywhile

withyield

22

Page 27: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101: Case Sensitivity• Python is case sensitive

• These are all different variables:

35

name = "Jake"Name = "Elwood"NAME = "Guido"

• Language statements are always lower-caseprint "Hello World" # OKPRINT "Hello World" # ERROR

while x < 0: # OKWHILE x < 0: # ERROR

• So, no shouting please...

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101: Looping

• The while statement executes a loop

• Executes the indented statements underneath while the condition is true

36

while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness

23

Page 28: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Indentation

• Indentation used to denote blocks of code

• Indentation must be consistent

while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness

while thickness <= height:thickness = thickness * 2 numfolds = numfolds + 1

print numfolds, thickness

(ok) (error)

• Colon (:) always indicates start of new block

while thickness <= height:

37

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Indentation

• There is a preferred indentation style

• Always use spaces

• Use 4 spaces per level

• Avoid tabs

• Always use a Python-aware editor

38

24

Page 29: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Conditionals

• If-elseif a < b: print "Computer says no"else: print "Computer says yes"

• If-elif-elseif a == '+': op = PLUSelif a == '-': op = MINUSelif a == '*': op = TIMESelse: op = UNKNOWN

39

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Relations

40

• Relational operators

< > <= >= == !=

• Boolean expressions (and, or, not)

if b >= a and b <= c: print "b is between a and c"

if not (b < a or b > c): print "b is still between a and c"

• Non-zero numbers, non-empty objects also evaluate as True.

x = 42if x: # x is nonzero

25

Page 30: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Printing

• The print statement

print x print x,y,zprint "Your name is", nameprint x, # Omits newline

• Produces a single line of text

• Items are separated by spaces

• Always prints a newline unless a trailing comma is added after last item

41

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : pass statement

• Sometimes you will need to specify an empty block of code

if name in namelist: # Not implemented yet (or nothing) passelse: statements

42

• pass is a "no-op" statement

• It does nothing, but serves as a placeholder for statements (possibly to be added later)

26

Page 31: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Python 101 : Long Lines

• Sometimes you get long statements that you want to break across multiple lines

• Use the line continuation character (\)

if product=="game" and type=="pirate memory" \ and age >= 4 and age <= 8: print "I'll take it!"

43

• However, not needed for code in (), [], or {}

if (product=="game" and type=="pirate memory" and age >= 4 and age <= 8): print "I'll take it!"

Copyright (C) 2008, http://www.dabeaz.com 1-

Exercise 1.3

44

Time: 15 minutes

27

Page 32: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Basic Datatypes

• Python only has a few primitive types of data

• Numbers

• Strings (character text)

45

Copyright (C) 2007, http://www.dabeaz.com 1-

Numbers

• Python has 5 types of numbers

• Booleans

• Integers

• Long integers

• Floating point

• Complex (imaginary numbers)

46

28

Page 33: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Booleans (bool)

• Two values: True, Falsea = Trueb = False

• Evaluated as integers with value 1,0

c = 4 + True # c = 5d = Falseif d == 0: print "d is False"

• Although doing that in practice would be odd

47

Copyright (C) 2007, http://www.dabeaz.com 1-

Integers (int)

• Signed integers up to machine precision

a = 37b = -299392993c = 0x7fa8 # Hexadecimald = 0253 # Octal

• Typically 32 bits

• Comparable to the C long type

48

29

Page 34: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Long Integers (long)

• Arbitrary precision integersa = 37Lb = -126477288399477266376467L

• Integers that overflow promote to longs>>> 3 ** 7367585198634817523235520443624317923L>>> a = 72883988882883812>>> a72883988882883812L>>>

• Can almost always be used interchangeably with integers

49

Copyright (C) 2007, http://www.dabeaz.com 1-

Integer Operations+ Add- Subtract* Multiply/ Divide // Floor divide% Modulo** Power<< Bit shift left>> Bit shift right& Bit-wise AND| Bit-wise OR^ Bit-wise XOR~ Bit-wise NOTabs(x) Absolute valuepow(x,y[,z]) Power with optional modulo (x**y)%zdivmod(x,y) Division with remainder

50

30

Page 35: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Integer Division• Classic division (/) - truncates

>>> 5/41>>>

• Floor division (//) - truncates (same)>>> 5//41>>>

• Future division (/) - Converts to float>>> from __future__ import division>>> 5/41.25

• Will change in some future Python version

• If truncation is intended, use //51

Copyright (C) 2007, http://www.dabeaz.com 1-

Floating point (float)

• Use a decimal or exponential notationa = 37.45b = 4e5c = -1.345e-10

• Represented as double precision using the native CPU representation (IEEE 754)

17 digits of precisionExponent from -308 to 308

• Same as the C double type

52

31

Page 36: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Floating point• Be aware that floating point numbers are

inexact when representing decimal values.

>>> a = 3.4>>> a3.3999999999999999>>>

53

• This is not Python, but the underlying floating point hardware on the CPU.

• When inspecting data at the interactive prompt, you will always see the exact representation (print output may differ)

Copyright (C) 2007, http://www.dabeaz.com 1-

Floating Point Operators+ Add- Subtract* Multiply/ Divide % Modulo (remainder)** Powerpow(x,y [,z]) Power modulo (x**y)%zabs(x) Absolute valuedivmod(x,y) Division with remainder

• Additional functions are in the math module

import matha = math.sqrt(x)b = math.sin(x)c = math.cos(x)d = math.tan(x)e = math.log(x)

54

32

Page 37: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Converting Numbers• Type name can be used to convert

a = int(x) # Convert x to integerb = long(x) # Convert x to longc = float(x) # Convert x to float

• Example:>>> a = 3.14159>>> int(a)3>>>

• Also work with strings containing numbers

>>> a = "3.14159">>> float(a)3.1415899999999999>>> int("0xff",16) # Optional integer base255

55

Copyright (C) 2008, http://www.dabeaz.com 1-

Strings• Written in programs with quotes

a = "Yeah but no but yeah but..."

b = 'computer says no'

c = '''Look into my eyes, look into my eyes,the eyes, the eyes, the eyes,not around the eyes, don't look around the eyes,look into my eyes, you're under.'''

• Standard escape characters work (e.g., '\n')

• Triple quotes capture all literal text enclosed

56

33

Page 38: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

String Escape Codes

'\n' Line feed'\r' Carriage return'\t' Tab'\xhh' Hexadecimal value'\”' Literal quote'\\' Backslash

• In quotes, standard escape codes work

• Raw strings (don’t interpret escape codes)a = r"c:\newdata\test" # String exactly as specified

57

Leading r

Copyright (C) 2007, http://www.dabeaz.com 1-

String Representation

• An ordered sequence of bytes (characters)

58

• Stores 8-bit data (ASCII)

• May contain binary data, control characters, etc.

• Strings are frequently used for both text and for raw-data of any kind

34

Page 39: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

String Representation

• Strings work like an array : s[n]a = "Hello world"b = a[0] # b = 'H'c = a[4] # c = 'o'd = a[-1] # d = 'd' (Taken from end of string)

• Slicing/substrings : s[start:end]d = a[:5] # d = "Hello"e = a[6:] # e = "world"f = a[3:8] # f = "lo wo"g = a[-5:] # g = "world"

• Concatenation (+)a = "Hello" + "World"b = "Say " + a

59

Copyright (C) 2007, http://www.dabeaz.com 1-

More String Operations• Length (len)

>>> s = "Hello">>> len(s)5>>>

• Membership test (in)>>> 'e' in sTrue>>> 'x' in sFalse>>> "ello" in sTrue

60

• Replication (s*n)>>> s = "Hello">>> s*5'HelloHelloHelloHelloHello'>>>

35

Page 40: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

String Methods

• Stripping any leading/trailing whitespace

t = s.strip()

• Case conversion

t = s.lower()t = s.upper()

• Replacing text

t = s.replace("Hello","Hallo")

61

• Strings have "methods" that perform various operations with the string data.

Copyright (C) 2007, http://www.dabeaz.com 1-

More String Methodss.endswith(suffix) # Check if string ends with suffixs.find(t) # First occurrence of t in ss.index(t) # First occurrence of t in ss.isalpha() # Check if characters are alphabetics.isdigit() # Check if characters are numerics.islower() # Check if characters are lower-cases.isupper() # Check if characters are upper-cases.join(slist) # Joins lists using s as delimeter s.lower() # Convert to lower cases.replace(old,new) # Replace texts.rfind(t) # Search for t from end of strings.rindex(t) # Search for t from end of strings.split([delim]) # Split string into list of substringss.startswith(prefix) # Check if string starts with prefixs.strip() # Strip leading/trailing spaces.upper() # Convert to upper case

62

36

Page 41: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

String Mutability

• Strings are "immutable" (read only)

• Once created, the value can't be changed

>>> s = "Hello World">>> s[1] = 'a'Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignment>>>

63

• All operations and methods that manipulate string data always create new strings

Copyright (C) 2007, http://www.dabeaz.com 1-

String Conversions

• To convert any object to string

• Produces the same text as print

s = str(obj)

• Actually, print uses str() for output

>>> x = 42>>> str(x)'42'>>>

64

37

Page 42: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Exercise 1.4

65

Time: 10 minutes

Copyright (C) 2007, http://www.dabeaz.com 1-

String Splitting

• Strings often represent fields of data

• To work with each field, split into a list

>>> line = 'GOOG 100 490.10'>>> fields = line.split() >>> fields['GOOG', '100', '490.10']>>>

66

• Example: When reading data from a file, you might read each line and then split the line into columns.

38

Page 43: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Lists• A array of arbitrary values

names = [ "Elwood", "Jake", "Curtis" ]nums = [ 39, 38, 42, 65, 111]

• Can contain mixed data types items = [ "Elwood", 39, 1.5 ]

• Adding new items (append, insert)

items.append("that") # Adds at enditems.insert(2,"this") # Inserts in middle

67

• Concatenation : s + ts = [1,2,3]t = ['a','b']

s + t [1,2,3,'a','b']

Copyright (C) 2007, http://www.dabeaz.com 1-

Lists (cont)

• Negative indices are from the end

names[-1] "Curtis"

68

• Lists are indexed by integers (starting at 0)names = [ "Elwood", "Jake", "Curtis" ]

names[0] "Elwood"names[1] "Jake"names[2] "Curtis"

• Changing one of the items

names[1] = "Joliet Jake"

39

Page 44: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

More List Operations• Length (len)

>>> s = ['Elwood','Jake','Curtis']>>> len(s)3>>>

• Membership test (in)>>> 'Elwood' in sTrue>>> 'Britney' in sFalse>>>

69

• Replication (s*n)>>> s = [1,2,3]>>> s*3[1,2,3,1,2,3,1,2,3]>>>

Copyright (C) 2007, http://www.dabeaz.com 1-

List Removal

• Removing an item

names.remove("Curtis")

del names[2]

• Deleting an item by index

70

• Removal results in items moving down to fill the space vacated (i.e., no "holes").

40

Page 45: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Exercise 1.5

71

Time: 10 minutes

Copyright (C) 2007, http://www.dabeaz.com 1-

File Input and Output

• Opening a filef = open("foo.txt","r") # Open for readingg = open("bar.txt","w") # Open for writing

• To read dataline = f.readline() # Read a line of textdata = f.read() # Read all data

• To write text to a file

g.write("some text")

• To print to a fileprint >>g, "Your name is", name

72

41

Page 46: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Looping over a file• Reading a file line by line

f = open("foo.txt","r") for line in f: # Process the line ...f.close()

• Alternativelyfor line in open("foo.txt","r"): # Process the line ...

• This reads all lines until you reach the end of the file

73

Copyright (C) 2008, http://www.dabeaz.com 1-

Exercise 1.6

74

Time: 10 minutes

42

Page 47: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Type Conversion

• In Python, you must be careful about converting data to an appropriate type

x = '37' # Stringsy = '42'z = x + y # z = '3742' (concatenation)

x = 37y = 42z = x + y # z = 79 (integer +)

75

• This differs from Perl where "+" is assumed to be numeric arithmetic (even on strings)$x = '37';$y = '42';$z = $x + $y; # $z = 79

Copyright (C) 2007, http://www.dabeaz.com 1-

Simple Functions

• Use functions for code you want to reuse

def sumcount(n): total = 0 while n > 0: total += n n -= 1 return total

• Calling a function

a = sumcount(100)

76

• A function is just a series of statements that perform some task and return a result

43

Page 48: Python Binder

Copyright (C) 2007, http://www.dabeaz.com 1-

Library Functions

• Python comes with a large standard library

• Library modules accessed using import

import mathx = math.sqrt(10)

import urllibu = urllib.urlopen("http://www.python.org/index.html")data = u.read()

77

• Will cover in more detail later

Copyright (C) 2007, http://www.dabeaz.com 1-

Exception Handling

• Errors are reported as exceptions

• An exception causes the program to stop

>>> f = open("file.dat","r")Traceback (most recent call last): File "<stdin>", line 1, in <module>IOError: [Errno 2] No such file or directory: 'file.dat'>>>

78

• For debugging, message describes what happened, where the error occurred, along with a traceback.

44

Page 49: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Exceptions

• To catch, use try-except statementtry: f = open(filename,"r")except IOError: print "Could not open", filename

79

• Exceptions can be caught and handled

>>> f = open("file.dat","r")Traceback (most recent call last): File "<stdin>", line 1, in <module>IOError: [Errno 2] No such file or directory: 'file.dat'>>>

Name must match the kind of error you're trying to catch

Copyright (C) 2008, http://www.dabeaz.com 1-

Exceptions

• To raise an exception, use the raise statement

raise RuntimeError("What a kerfuffle")

80

% python foo.pyTraceback (most recent call last): File "foo.py", line 21, in <module> raise RuntimeError("What a kerfuffle")RuntimeError: What a kerfuffle

• Will cause the program to abort with an exception traceback (unless caught by try-except)

45

Page 50: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

dir() function

• dir(obj) returns all names defined in obj

>>> import sys>>> dir(sys)['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdout__', '_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getcheckinterval', ...'version_info', 'warnoptions']

• Useful for exploring, inspecting objects, etc.

81

Copyright (C) 2008, http://www.dabeaz.com 1-

dir() Example• dir() will also tell you what methods/

attributes are available on an object.>>> a = "Hello World">>> dir(a)['__add__','__class__', ..., 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', ... 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']>>>

82

• Note: you may see a lot of names with leading/trailing underscores. These have special meaning to Python--ignore for now.

46

Page 51: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 1-

Summary

• This has been an overview of simple Python

• Enough to write basic programs

• Just have to know the core datatypes and a few basics (loops, conditions, etc.)

83

Copyright (C) 2008, http://www.dabeaz.com 1-

Exercise 1.7

84

Time: 15 minutes

47

Page 52: Python Binder

Working with Data

Section 2

Copyright (C) 2008, http://www.dabeaz.com 2-

Overview

• Most programs work with data

• In this section, we look at how Python programmers represent and work with data

• Common programming idioms

• How to (not) shoot yourself in the foot

2

48

Page 53: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Primitive Datatypes

• Python has a few primitive types of data

• Integers

• Floating point numbers

• Strings (text)

• Obviously, all programs use these

3

Copyright (C) 2008, http://www.dabeaz.com 2-

None type

• Nothing, nil, null, nada

logfile = None

• This is often used as a placeholder for optional program settings or values

4

if logfile: logfile.write("Some message")

• If you don't assign logfile to something, the above code would crash (undefined variable)

49

Page 54: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Data Structures

• Real programs have more complex data

• Example: A holding of stock

100 shares of GOOG at $490.10

• An "object" with three parts

• Name ("GOOG", a string)

• Number of shares (100, an integer)

• Price (490.10, a float)

5

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuples• A collection of values grouped together

• Example:

s = ('GOOG',100,490.10)

• Sometimes the () are omitted in syntax

s = 'GOOG', 100, 490.10

• Special cases (0-tuple, 1-tuple)

t = () # An empty tuple w = ('GOOG',) # A 1-item tuple

6

50

Page 55: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuple Use

• Tuples are usually used to represent simple records and data structures

contact = ('David Beazley','[email protected]')stock = ('GOOG', 100, 490.10)host = ('www.python.org', 80)

• Basically, a simple "object" of multiple parts

7

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuples (cont)

• Tuple contents are ordered (like an array)

s = ('GOOG',100, 490.10)name = s[0] # 'GOOG'shares = s[1] # 100price = s[2] # 490.10

• However, the contents can't be modified>>> s[1] = 75 TypeError: object does not support item assignment

• Although you can always create a new tuplet = (s[0], 75, s[2])

8

51

Page 56: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuple Packing

• Tuples are really focused on packing and unpacking data into variables , not storing distinct items in a list

• Packing multiple values into a tuple

s = ('GOOG', 100, 490.10)

• The tuple is then easy to pass around to other parts of a program as a single object

9

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuple Unpacking

• To use the tuple elsewhere, you typically unpack its parts into variables

• Unpacking values from a tuple

(name, shares, price) = s

print "Cost", shares*price

• Note: The () syntax is sometimes omitted

name, shares, price = s

10

52

Page 57: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuple Commentary

• Tuples are a fundamental part of Python

• Used for critical parts of the interpreter

• Highly optimized

• Key point : Compound data, but not used to represent a list of distinct "objects"

11

Copyright (C) 2008, http://www.dabeaz.com 2-

Tuples vs. Lists

• Can't you just use a list instead of a tuple?

12

s = ['GOOG', 100, 490.10]

• Well, yes, but it's not quite as efficient

• The implementation of lists is optimized for growth using the append() method.

• There is often a little extra space at the end

• So using a list requires a bit more memory

53

Page 58: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Dictionaries

• A hash table or associative array

• A collection of values indexed by "keys"

• The keys are like field names

• Example:

s = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10}

13

Copyright (C) 2008, http://www.dabeaz.com 2-

Dictionaries• Getting values: Just use the key names

>>> print s['name'],s['shares']GOOG 100>>> s['price']490.10>>>

• Adding/modifying values : Assign to key names>>> s['shares'] = 75>>> s['date'] = '6/6/2007'>>>

• Deleting a value>>> del s['date']>>>

14

54

Page 59: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Dictionaries

• When to use a dictionary as a data structure

• Data has many different parts

• The parts will be modified/manipulated

• Example: If you were reading data from a database and each row had 50 fields, a dictionary could store the contents of each row using descriptive field names.

15

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.1

Time : 10 minutes

16

55

Page 60: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Containers

• Programs often have to work many objects

• A portfolio of stocks

• Spreadsheets and matrices

• Three choices:

• Lists (ordered data)

• Dictionaries (unordered data)

• Sets (unordered collection)

17

Copyright (C) 2008, http://www.dabeaz.com 2-

Lists as a Container

• Use a list when the order of data matters

• Lists can hold any kind of object

• Example: A list of tuples

portfolio = [ ('GOOG',100,490.10), ('IBM',50,91.10), ('CAT',150,83.44)]

portfolio[0] ('GOOG',100,490.10)portfolio[1] ('IBM',50,91.10)

18

56

Page 61: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Dicts as a Container• Dictionaries are useful if you want fast

random lookups (by key name)

• Example: A dictionary of stock prices

prices = { 'GOOG' : 513.25, 'CAT' : 87.22, 'IBM' : 93.37, 'MSFT' : 44.12 ...}

>>> prices['IBM']93.37>>> prices['GOOG']513.25>>>

19

Copyright (C) 2008, http://www.dabeaz.com 2-

Dict : Looking For Items

• To test for existence of a key

if key in d: # Yeselse: # No

20

• Looking up a value that might not exist

name = d.get(key,default)

• Example:

>>> prices.get('IBM',0.0)93.37>>> prices.get('SCOX',0.0)0.0>>>

57

Page 62: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Dicts and Lists• You often get input data as a list of pairs

21

"GOOG",513.25"CAT",87.22"IBM",93.37"MSFT",44.12...

[ ('GOOG',513.25), ('CAT',87.22), ('IBM',93.37), ('MSFT',44.12),...]

file tuples

• dict() promotes a list of pairs to a dictionary

pricelist = [('GOOG',513.25),('CAT',87.22), ('IBM',93.37),('MSFT',44.12)]

prices = dict(pricelist)print prices['IBM']print prices['MSFT']

Copyright (C) 2008, http://www.dabeaz.com 2-

Sets• Sets

a = set([2,3,4])

• Holds collection of unordered items

• No duplicates, support common set ops

>>> a = set([2,3,4])>>> b = set([4,8,9])>>> a | b # Unionset([2,3,4,8,9])>>> a & b # Intersectionset([4])>>> a - b # Differenceset([2,3])>>>

22

58

Page 63: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.2

Time : 20 minutes

23

Copyright (C) 2008, http://www.dabeaz.com 2-

Formatted Output

• When working with data, you often want to produce structured output (tables, etc.).

Name Shares Price ---------- ---------- ---------- AA 100 32.20 IBM 50 91.10 CAT 150 83.44 MSFT 200 51.23 GE 95 40.37 MSFT 50 65.10 IBM 100 70.44

24

59

Page 64: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

String Formatting

• Formatting operator (%)>>> "The value is %d" % 3'The value is 3'>>> "%5d %-5d %10d" % (3,4,5)' 3 4 5'>>> "%0.2f" % (3.1415926,)‘3.14’

• Requires single item or a tuple on right

• Commonly used with printprint "%d %0.2f %s" % (index,val,label)

• Format codes are same as with C printf()

25

Copyright (C) 2008, http://www.dabeaz.com 2-

Format Codes

%d Decimal integer%u Unsigned integer%x Hexadecimal integer%f Float as [-]m.dddddd%e Float as [-]m.dddddde+-xx%g Float, but selective use of E notation%s String%c Character%% Literal %

%10d Decimal in a 10-character field (right align)%-10d Decimal in a 10-character field (left align)%0.2f Float with 2 digit precision%40s String in a 40-character field (right align)%-40s String in a 40-character field (left align)

26

60

Page 65: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

String Formatting• Formatting with fields in a dictionary

>>> stock = {... 'name' : 'GOOG',... 'price' : 490.10,... 'shares' : 100}>>> print "%(name)8s %(shares)10d %(price)10.2f" % stock GOOG 100 490.10>>>

• Useful if performing replacements with a large number of named values

• This is Python's version of "string interpolation" ($variables in other langs)

27

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.3

Time : 20 minutes

28

61

Page 66: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Working with Sequences

• Python has three "sequence" datatypesa = "Hello" # Stringb = [1,4,5] # List c = ('GOOG',100,490.10) # Tuple

• Sequences are ordered : s[n]

a[0] 'H'b[-1] 5c[1] 100

• Sequences have a length : len(s)len(a) 5len(b) 3len(c) 3

29

Copyright (C) 2008, http://www.dabeaz.com 2-

Working with Sequences

• Sequences can be replicated : s * n>>> a = 'Hello' >>> a * 3'HelloHelloHello'>>> b = [1,2,3]>>> b * 2[1, 2, 3, 1, 2, 3]>>>

• Similar sequences can be concatenated : s + t>>> a = (1,2,3)>>> b = (4,5)>>> a + b(1,2,3,4,5)>>>

30

62

Page 67: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Sequence Slicing• Slicing operator : s[start:end]

a = [0,1,2,3,4,5,6,7,8]

a[2:5] [2,3,4] a[-5:] [4,5,6,7,8]

a[:3] [0,1,2]

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

• Indices must be integers

• Slices do not include end value

• If indices are omitted, they default to the beginning or end of the list.

31

Copyright (C) 2008, http://www.dabeaz.com 2-

Extended Slices• Extended slicing: s[start:end:step]

a = [0,1,2,3,4,5,6,7,8]

a[:5] [0,1,2,3,4]

a[-5:] [4,5,6,7,8]

a[0:5:2] [0,2,4]

a[::-2] [8,6,4,2,0]

a[6:2:-1] [6,5,4,3]

• step indicates stride and direction

• end index is not included in result

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

32

63

Page 68: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Sequence Reductions

• sum(s)

>>> s = [1, 2, 3, 4]>>> sum(s)10>>>

• min(s), max(s)

>>> min(s)1>>> max(s)4>>> max(t)'World'>>>

33

Copyright (C) 2008, http://www.dabeaz.com 2-

Iterating over a Sequence

• The for-loop iterates over sequence data

• On each iteration of the loop, you get new item of data to work with.

>>> s = [1, 4, 9, 16]>>> for i in s:... print i...14916>>>

34

64

Page 69: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Iteration Variables• Each time through the loop, a new value is

placed into an iteration variable

for x in s: statements:

• Overwrites the previous value (if any)

iteration variable

x = 42

for x in s: # Overwrites any previous x statements

print x # Prints value from last iteration

• After the loop finishes, the variable has the value from the last iteration of the loop

35

Copyright (C) 2008, http://www.dabeaz.com 2-

break and continue

• Jumping to the start of the next iteration

• Breaking out of a loop (exiting)

for name in namelist: if name == username: break

for line in lines: if not line: continue # More statements ...

• These statements only apply to the inner-most loop that is active

36

65

Page 70: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Looping over integers

• xrange([start,] end [,step])

for i in xrange(100): # i = 0,1,...,99

for j in xrange(10,20): # j = 10,11,..., 19

for k in xrange(10,50,2): # k = 10,12,...,48

• Note: The ending value is never included (this mirrors the behavior of slices)

• If you simply need to count, use xrange()

37

Copyright (C) 2008, http://www.dabeaz.com 2-

Caution with range()• range([start,] end [,step])

x = range(100) # x = [0,1,...,99]y = range(10,20) # y = [10,11,...,19]z = range(10,50,2) # z = [10,12,...,48]

• range() creates a list of integers

• You will sometimes see this code

for i in range(N): statements

• If you are only looping, use xrange() instead. It computes its values on demand instead of creating a list.

38

66

Page 71: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

enumerate() Function

• Provides a loop counter value

names = ["Elwood","Jake","Curtis"]for i,name in enumerate(names): # Loops with i = 0, name = 'Elwood' # i = 1, name = 'Jake' # i = 2, name = 'Curtis' ...

• Example: Keeping a line numberfor linenumber,line in enumerate(open(filename)): ...

39

Copyright (C) 2008, http://www.dabeaz.com 2-

enumerate() Function

• enumerate() is a nice shortcutfor i,x in enumerate(s): statements

• Compare to:i = 0for x in s: statements i += 1

• Less typing and enumerate() runs slightly faster

40

67

Page 72: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

for and tuples

• Looping with multiple iteration variablespoints = [ (1,4),(10,40),(23,14),(5,6),(7,8)]

for x,y in points: # Loops with x = 1, y = 4 # x = 10, y = 40 # x = 23, y = 14 # ...

• Here, each tuple is unpacked into a set of iteration variables.

tuples are expanded

41

Copyright (C) 2008, http://www.dabeaz.com 2-

zip() Function

• Combines multiple sequences into tuples

a = [1,4,9]b = ['Jake','Elwood','Curtis']

x = zip(a,b) # x = [(1,'Jake'),(4,'Elwood'), ...]

• One use: Looping over multiple listsa = [1,4,9]b = ['Jake','Elwood','Curtis']

for num,name in zip(a,b): # Statements ...

42

68

Page 73: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

zip() Function

• zip() always stops with shortest sequence

a = [1,2,3,4,5,6]b = ['Jake','Elwood']

x = zip(a,b) # x = [(1,'Jake'),(2,'Elwood')]

• You may combine as many sequences as needed

a = [1, 2, 3, 4, 5, 6]b = ['a', 'b', 'c']c = [10, 20, 30]

x = zip(a,b,c) # x = [(1,'a',10),(2,'b',20),...]y = zip(a,zip(b,c)) # y = [(1,('a',10)),(2,('b',20)),...]

43

Copyright (C) 2008, http://www.dabeaz.com 2-

Using zip()

• zip() is also a nice programming shortcutfor a,b in zip(s,t): statements

• Compare to:i = 0while i < len(s) and i < len(t): a = s[i] b = t[i] statements

• Caveat: zip() constructs a list of tuples. Be careful when working with large data

44

69

Page 74: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.4

Time : 10 minutes

45

Copyright (C) 2008, http://www.dabeaz.com 2-

List Sorting

• Lists can be sorted "in-place" (sort method)

s = [10,1,7,3]s.sort() # s = [1,3,7,10]

• Sorting in reverse order

s = [10,1,7,3]s.sort(reverse=True) # s = [10,7,3,1]

• Sorting works with any ordered types = ["foo","bar","spam"]s.sort() # s = ["bar","foo","spam"]

46

70

Page 75: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

List Sorting

• Sometimes you need to perform extra processing while sorting

• Example: Case-insensitive string sort

>>> s = ["hello","WORLD","test"]>>> s.sort()>>> s['WORLD','hello','test']>>>

• Here, we might like to fix the order

47

Copyright (C) 2008, http://www.dabeaz.com 2-

List Sorting• Sorting with a key function:

>>> def tolower(x):... return x.lower()...>>> s = ["hello","WORLD","test"]>>> s.sort(key=tolower)>>> s['hello','test','WORLD']>>>

• The key function is a "callback function" that the sort() method applies to each item

• The value returned by the key function determines the sort order

48

71

Page 76: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Sequence Sorting

• sorted() function

• Turns any sequence into a sorted list

>>> sorted("Python")['P', 'h', 'n', 'o', 't', 'y']>>> sorted((5,1,9,2))[1, 2, 5, 9]

49

• This is not an in-place sort--a new list is returned as a result.

• The same key and reverse options can be supplied if necessary

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.5

Time : 10 Minutes

50

72

Page 77: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

List Processing• Working with lists is very common

• Python is very adept at processing lists

• Have already seen many examples:

>>> a = [1, 2, 3, 4, 5]>>> sum(a)15>>> a[0:3][1, 2, 3]>>> a * 2[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]>>> min(a)1>>>

51

Copyright (C) 2008, http://www.dabeaz.com 2-

List Comprehensions

• Creates a new list by applying an operation to each element of a sequence.

>>> a = [1,2,3,4,5]>>> b = [2*x for x in a]>>> b[2,4,6,8,10]>>>

• Another example:>>> names = ['Elwood','Jake']>>> a = [name.lower() for name in names]>>> a['elwood','jake']>>>

52

73

Page 78: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

List Comprehensions

• A list comprehension can also filter

>>> f = open("stockreport","r")>>> goog = [line for line in f if 'GOOG' in line]>>>

>>> a = [1, -5, 4, 2, -2, 10]>>> b = [2*x for x in a if x > 0]>>> b[2,8,4,20]>>>

• Another example

53

Copyright (C) 2008, http://www.dabeaz.com 2-

List Comprehensions

• General syntax

[expression for x in s if condition]

• What it meansresult = []for x in s: if condition: result.append(expression)

• Can be used anywhere a sequence is expected>>> a = [1,2,3,4]>>> sum([x*x for x in a])30>>>

54

74

Page 79: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

List Comp: Examples

• List comprehensions are hugely useful

• Collecting the values of a specific field

stocknames = [s['name'] for s in stocks]

• Performing database-like queries

a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]

• Quick mathematics over sequences

cost = sum([s['shares']*s['price'] for s in stocks])

55

Copyright (C) 2008, http://www.dabeaz.com 2-

Historical Digression

• List comprehensions come from Haskell

a = [x*x for x in s if x > 0] # Python

a = [x*x | x <- s, x > 0] # Haskell

• And this is motivated by sets (from math)

a = { x2 | x ! s, x > 0 }

• But most Python programmers would probably just view this as a "cool shortcut"

56

75

Page 80: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

List Comp. and Awk

• For Unix hackers, there is a certain similarity between list comprehensions and short one-line awk commands

# A Python List Comprehensiontotalcost = sum([shares*price for name, shares, price in portfolio])

# A Unix awk commandtotalcost = `awk '{ total += $2*$3 } END { print total }' portfolio.dat`

57

• Applying an operation to every line of a file

Copyright (C) 2008, http://www.dabeaz.com 2-

Big Idea: Being Declarative

• List comprehensions encourage a more "declarative" style of programming when processing sequences of data.

• Data can be manipulated by simply "declaring" a series of statements that perform various operations on it.

58

76

Page 81: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.6

Time : 15 Minutes

59

Copyright (C) 2008, http://www.dabeaz.com 2-

More details on objects

• So far: a tour of the most common types

• Have skipped some critical details

• Memory management

• Copying

• Type checking

60

77

Page 82: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Variable Assignment

• Variables in Python are names for values

• A variable name does not represent a fixed memory location into which values are stored (like C, C++, Fortran, etc.)

• Assignment is just a naming operation

61

Copyright (C) 2008, http://www.dabeaz.com 2-

Variables and Values• At any time, a variable can be redefined to

refer to a new value

a = 42

...

a = "Hello"

42"a"

• Variables are not restricted to one data type

• Assignment doesn't overwrite the previous value (e.g., copy over it in memory)

• It just makes the name point elsewhere62

"Hello""a"

78

Page 83: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Names, Values, Types• Names do not have a "type"--it's just a name

• However, values do have an underlying type

>>> a = 42>>> b = "Hello World">>> type(a)<type 'int'>>>> type(b)<type 'str'>

• type() function will tell you what it is

• The type name is usually a function that creates or converts a value to that type

>>> str(42)'42'

63

Copyright (C) 2008, http://www.dabeaz.com 2-

Reference Counting• Variable assignment never copies anything!

• Instead, it just updates a reference count

a = 42b = ac = [1,2]c.append(b)

42"a"

"b"

"c"

ref = 3

[x, x, x]

• So, different variables might be referring to the same object (check with the is operator)>>> a is bTrue>>> a is c[2]True

64

79

Page 84: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Reference Counting• Reassignment never overwrites memory, so you

normally don't notice any of the sharing

a = 42b = a

42

"a" ref = 2

• When you reassign a variable, the name is just made to point to the new value.

a = 37 42

"a"

ref = 1

37

ref = 1

65

"b"

"b"

Copyright (C) 2008, http://www.dabeaz.com 2-

The Hidden Danger• "Copying" mutable objects such as lists and dicts

>>> a = [1,2,3,4]>>> b = a>>> b[2] = -10>>> a[1,2,-10,4]

[1,2,-10,4]"a"

"b"

• Changes affect both variables!

• Reason: Different variable names are referring to exactly the same object

• Yikes!

66

80

Page 85: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Making a Copy• You have to take special steps to copy data

>>> a = [2,3,[100,101],4]>>> b = list(a) # Make a copy>>> a is bFalse

• It's a new list, but the list items are shared

>>> a[2].append(102)>>> b[2][100,101,102]>>> 100 101 1022 3 4

a

bThis inner list isstill being shared

67

• Known as a "shallow copy"

Copyright (C) 2008, http://www.dabeaz.com 2-

Deep Copying

• Use the copy module

>>> a = [2,3,[100,101],4]>>> import copy>>> b = copy.deepcopy(a)>>> a[2].append(102)>>> b[2][100,101]>>>

• Sometimes you need to makes a copy of an object and all objects contained within it

68

81

Page 86: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Everything is an object

• Numbers, strings, lists, functions, exceptions, classes, instances, etc...

• All objects are said to be "first-class"

• Meaning: All objects that can be named can be passed around as data, placed in containers, etc., without any restrictions.

• There are no "special" kinds of objects

69

Copyright (C) 2008, http://www.dabeaz.com 2-

First Class Objects• A simple example:

>>> import math>>> items = [abs, math, ValueError ]>>> items[<built-in function abs>, <module 'math' (builtin)>, <type 'exceptions.ValueError'>]>>> items[0](-45)45>>> items[1].sqrt(2)1.4142135623730951>>> try: x = int("not a number") except items[2]: print "Failed!"

Failed!>>>

70

A list containing a function, a module, and an exception.

You can use items in the list in place of the

original names

82

Page 87: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Type Checking

• How to tell if an object is a specific type

if type(a) is list: print "a is a list"

if isinstance(a,list): # Preferred print "a is a list"

• Checking for one of many types

if isinstance(a,(list,tuple)): print "a is a list or tuple"

71

Copyright (C) 2008, http://www.dabeaz.com 2-

Summary

• Have looked at basic principles of working with data in Python programs

• Brief look at part of the object-model

• A big part of understanding most Python programs.

72

83

Page 88: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 2-

Exercise 2.7

Time : 15 Minutes

73

84

Page 89: Python Binder

Program Organization and Functions

Section 3

Copyright (C) 2008, http://www.dabeaz.com 3-

Overview

• How to organize larger programs

• More details on program execution

• Defining and working with functions

• Exceptions and Error Handling

2

85

Page 90: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Observation

• A large number of Python programmers spend most of their time writing short "scripts"

• One-off problems, prototyping, testing, etc.

• Python is good at this!

• And it what draws many users to Python

3

Copyright (C) 2008, http://www.dabeaz.com 3-

What is a "Script?"

• A "script" is a program that simply runs a series of statements and stops

# program.py

statement1

statement2

statement3

...

4

• We've been writing scripts to this point

86

Page 91: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Problem

• If you write a useful script, it will grow features

• You may apply it to other related problems

• Over time, it might become a critical application

• And it might turn into a huge tangled mess

• So, let's get organized...

5

Copyright (C) 2008, http://www.dabeaz.com 3-

Program Structure

• Recall: programs are a sequence of statements

height = 442 # Meters

thickness = 0.1*(0.001) # Meters (0.1 millimeter)

numfolds = 0

while thickness <= height:

thickness = thickness * 2

numfolds = numfolds + 1

print numfolds, thickness

print numfolds, "folds required"

print "final thickness is", thickness, "meters"

6

• Programs run by executing the statements in the same order that they are listed.

87

Page 92: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Defining Things• You must always define things before they

get used later on in a program.

a = 42

b = a + 2 # Requires that a is already defined

def square(x):

return x*x

z = square(b) # Requires square to be defined

7

• The order is important

• You almost always put the definitions of variables and functions near the beginning

Copyright (C) 2008, http://www.dabeaz.com 3-

Defining Functions

• It is a good idea to put all of the code related to a single "task" all in one placedef read_prices(filename):

pricesDict = {}

for line in open(filename):

fields = line.split(',')

name = fields[0].strip('"')

pricesDict[name] = float(fields[1])

return pricesDict

8

• A function also simplifies repeated operations

oldprices = read_prices("oldprices.csv")

newprices = read_prices("newprices.csv")

88

Page 93: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

What is a function?

• A function is a sequence of statementsdef funcname(args):

statement

statement

...

statement

9

• Any Python statement can be used inside

def foo():

import math

print math.sqrt(2)

help(math)

• There are no "special" statements in Python

Copyright (C) 2008, http://www.dabeaz.com 3-

Function Definitions

• Functions can be defined in any order

def foo(x):

bar(x)

def bar(x):

statements

10

• Functions must only be defined before they are actually used during program execution

foo(3) # foo must be defined already

def bar(x)

statements

def foo(x):

bar(x)

• Stylistically, it is more common to see functions defined in a "bottom-up" fashion

89

Page 94: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Bottom-up Style• Functions are treated as building blocks

• The smaller/simpler blocks go first

# myprogram.py

def foo(x):

...

def bar(x):

...

foo(x)

...

def spam(x):

...

bar(x)

...

spam(42) # Call spam() to do something

11

Later functions build upon earlier functions

Code that uses the functions appears at the end

Copyright (C) 2008, http://www.dabeaz.com 3-

A Definition Caution• Functions can be redefined!

def foo(x):

return 2*x

print foo(2) # Prints 4

def foo(x,y): # Redefine foo(). This replaces

return x*y # foo() above.

print foo(2,3) # Prints 6

print foo(2) # Error : foo takes two arguments

12

• A repeated function definition silently replaces the previous definition

• No overloaded functions (vs. C++, Java).

90

Page 95: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Exercise 3.1

13

Time : 15 minutes

Copyright (C) 2008, http://www.dabeaz.com 3-

Function Design

• Functions should be easy to use

• The whole point of a function is to define code for repeated operations

• Some things to think about:

• Function inputs and output

• Side-effects

14

91

Page 96: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Function Arguments

• Functions operate on passed arguments

def square(x):

return x*x

15

• Argument variables receive their values when the function is calleda = square(3)

argument

• The argument names are only visible inside the function body (are local to function)

Copyright (C) 2008, http://www.dabeaz.com 3-

Default Arguments

• Sometimes you want an optional argument

def read_prices(filename,delimiter=','):

...

d = read_prices("prices.csv")

e = read_prices("prices.dat",' ')

• If a default value is assigned, the argument is optional in function calls

16

• Note : Arguments with defaults must appear at the end of the argument list (all non-optional arguments go first)

92

Page 97: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Calling a Function

• Consider a simple function

prices = read_prices("prices.csv",",")

• Calling with "positional" args

17

• This places values into the arguments in the same order as they are listed in the function definition (by position)

def read_prices(filename,delimiter):

...

Copyright (C) 2008, http://www.dabeaz.com 3-

Keyword Arguments

prices = read_prices(filename="price.csv",

delimiter=",")

• Calling with "keyword" arguments

18

• Here, you explicitly name and assign a value to each argument

• Common use case : functions with many optional features

def sort(data,reverse=False,key=None):

statements

sort(s,reverse=True)

93

Page 98: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Mixed Arguments

read_prices("prices.csv",delimiter=',')

• Positional and keyword arguments can be mixed together in a function call

19

• The positional arguments must go first.

• Basic rules:

• All required arguments get values

• No duplicate argument values

Copyright (C) 2008, http://www.dabeaz.com 3-

Design Tip

• Always give short, but meaningful names to function arguments

• Someone using a function may want to use the keyword calling style

20

d = read_prices("prices.csv", delimiter=",")

• Python development tools will show the names in help features and documentation

94

Page 99: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Design Tip

• Don't write functions that take a huge number of input parameters

• You are not going to remember how to call your 20-argument function (nor will you probably want to).

• Functions should only take a few inputs (fewer than 4 as a rule of thumb)

21

Copyright (C) 2008, http://www.dabeaz.com 3-

Design Tip• Don't transform function inputs

22

def read_data(filename):

f = open(filename)

for line in f:

...

• It limits the flexibility of the function

• Here is a better versiondef read_data(f):

for line in f:

...

# Sample use - pass in a different kind of file

import gzip

f = gzip.open("portfolio.gz")

portfolio = read_data(f)

Transforms filename into an open file

95

Page 100: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Return Values

• return statement returns a value

def square(x):

return x*x

23

• If no return value, None is returneddef bar(x):

statements

return

a = bar(4) # a = None

• Return value is discarded if not assigned/usedsquare(4) # Calls square() but discards result

Copyright (C) 2008, http://www.dabeaz.com 3-

Multiple Return Values

• A function may return multiple values by returning a tupledef divide(a,b):

q = a // b # Quotient

r = a % b # Remainder

return q,r # Return a tuple

24

• Usage examples:

x = divide(37,5) # x = (7,2)

x,y = divide(37,5) # x = 7, y = 2

96

Page 101: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Side Effects• When you call a function, the arguments

are simply names for passed values

• If mutable data types are passed (e.g., lists, dicts), they can be modified "in-place"

25

def square_list(s):

for i in xrange(len(s)):

s[i] = s[i]*s[i]

a = [1, 2, 3]

square_list(a)

print a # [1, 4, 9]

Modifies the input object

• Such changes are an example of "side effects"

Copyright (C) 2008, http://www.dabeaz.com 3-

Design Tip

• Don't modify function inputs

• When data moves around around in a program, it always does so by reference.

• If you modify something, it may affect other parts of the program you didn't expect

• And it will be really hard to debug

• Caveat: It depends on the problem

26

97

Page 102: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Understanding Variables

• Programs assign values to variables

• Variable assignments occur outside and inside function definitions

• Variables defined outside are "global"

• Variables inside a function are "local"

27

x = value # Global variable

def foo():

y = value # Local variable

Copyright (C) 2008, http://www.dabeaz.com 3-

Local Variables• Variables inside functions are private

• Values not retained or accessible after return>>> stocks = read_portfolio("stocks.dat")

>>> fields

Traceback (most recent call last):

File "<stdin>", line 1, in ?

NameError: name 'fields' is not defined

>>>

28

def read_portfolio(filename):

portfolio = []

for line in open(filename):

fields = line.split()

s = (fields[0],int(fields[1]),float(fields[2]))

portfolio.append(s)

return portfolio

• Don't conflict with variables found elsewhere

98

Page 103: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Global Variables• Functions can access the values of globals

• This is a useful technique for dealing with general program settings and other details

• However, you don't want to go overboard (otherwise the program becomes fragile)

29

delimiter = ','

def read_portfolio(filename):

...

for line in open(filename):

fields = line.split(delimiter)

...

Copyright (C) 2008, http://www.dabeaz.com 3-

Modifying Globals• One quirk: Functions can't modify globals

30

delimiter = ','

def set_delimiter(newdelimiter):

delimiter = newdelimiter

• Example:

>>> delimiter

','

>>> set_delimiter(':')

>>> delimiter

','

>>>Notice no change

• All assignments in functions are local

99

Page 104: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Modifying Globals

• If you want to modify a global variable you must declare it as such in the function

delimiter = ','

def set_delimiter(newdelimiter):

global delimiter

delimiter = newdelimiter

• global declaration must appear before use

• Only necessary for globals that will be modified (globals are already readable)

31

Copyright (C) 2008, http://www.dabeaz.com 3-

Design Tip

• If you use the global declaration a lot, you are probably making things difficult

• It's generally a bad idea to write programs that modify lots of global variables

• Better off using a user defined class (later)

32

100

Page 105: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Exercise 3.2

33

Time : 20 minutes

Copyright (C) 2008, http://www.dabeaz.com 3-

More on Functions

• There are additional details

• Variable argument functions

• Error checking

• Callback functions

• Anonymous functions

• Documentation strings

34

101

Page 106: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Variable Arguments

• Function that accepts any number of args

def foo(x,*args):

...

• Here, the arguments get passed as a tuple

foo(1,2,3,4,5)

def foo(x,*args):

(2,3,4,5)

35

1

Copyright (C) 2008, http://www.dabeaz.com 3-

Variable Arguments

• Example:

def print_headers(*headers):

for h in headers:

print "%10s" % h,

print

print ("-"*10 + " ")*len(headers)

• Usage:

36

>>> print_headers('Name','Shares','Price')

Name Shares Price

---------- ---------- ----------

>>>

102

Page 107: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Variable Arguments

• Function that accepts any keyword argsdef foo(x,y,**kwargs):

...

• Extra keywords get passed in a dict

foo(2,3,flag=True,mode="fast",header="debug")

def foo(x,y,**kwargs):

...

{ 'flag' : True,

'mode' : 'fast',

'header' : 'debug' }

37

Copyright (C) 2008, http://www.dabeaz.com 3-

Variable Keyword Use• Accepting any number of keyword args is

useful when there is a lot of complicated configuration featuresdef plot_data(data, **opts):

color = opts.get('color','black')

symbols = opts.get('symbol',None)

...

plot_data(x,

color="blue",

bgcolor="white",

title="Energy",

xmin=-4.0, xmax=4.0,

ymin=-10.0, ymax=10.0,

xaxis_title="T",

yaxis_title="KE",

...

)38

103

Page 108: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Variable Arguments

• A function that takes any argumentsdef foo(*args,**kwargs):

statements

• This will accept any combination of positional or keyword arguments

• Sometimes used when writing wrappers or when you want to pass arguments through to another function

39

Copyright (C) 2008, http://www.dabeaz.com 3-

Passing Tuples and Dicts

• Tuples can be expand into function args

args = (2,3,4)

foo(1, *args) # Same as foo(1,2,3,4)

• Dictionaries can expand to keyword args

40

kwargs = {

'color' : 'red',

'delimiter' : ',',

'width' : 400 }

foo(data, **kwargs)

# Same as foo(data,color='red',delimiter=',',width=400)

• These are not commonly used except when writing library functions.

104

Page 109: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Error Checking

• Python performs no checking or validation of function argument types or values

• A function will work on any data that is compatible with the statements in the function

def add(x,y):

return x + y

add(3,4) # 7

add("Hello","World") # "HelloWorld"

add([1,2],[3,4]) # [1,2,3,4]

• Example:

41

Copyright (C) 2008, http://www.dabeaz.com 3-

Error Checking• If there are errors in a function, they will

show up at run time (as an exception)

def add(x,y):

return x+y

>>> add(3,"hello")

Traceback (most recent call last):

...

TypeError: unsupported operand type(s) for +:

'int' and 'str'

>>>

• Example:

42

• To verify code, there is a strong emphasis on testing (covered later)

105

Page 110: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Error Checking

• Python also performs no checking of function return values

• Inconsistent use does not result in an error

def foo(x,y):

if x:

return x + y

else:

return

• Example:

43

Inconsistent useof return

(not checked)

Copyright (C) 2008, http://www.dabeaz.com 3-

Callback Functions

• Sometimes functions are written to rely on a user-supplied function that performs some kind of special processing.

• Example: Case-insensitive sortdef lowerkey(s):

return s.lower()

names.sort(key=lowerkey)

• Here, the sort() operation calls the lowerkey function as part of its processing (a callback).

44

106

Page 111: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Anonymous Functions

• lambda statement

names.sort(key=lambda s: s.lower())

• Creates a function that evaluates an expressionlowerkey = lambda s: s.lower()

# Same as

def lowerkey(s): return s.lower()

• Since callbacks are often short expressions, there is a shortcut syntax for it

45

• lambda is highly restricted. It can only be a single Python expression.

Copyright (C) 2008, http://www.dabeaz.com 3-

Documentation Strings• First line of function may be string

def divide(a,b):

"""Divides a by b and returns a quotient and

remainder. For example:

>>> divide(9/5)

(1,4)

>>> divide(15,4)

(3,3)

>>>

"""

q = a // b

r = a % b

return (q,r)

• Including a docstring is considered good style

• Why?

46

107

Page 112: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Docstring Benefits

• Online help>>> help(divide)

Help on function divide in module foo:

divide(a, b)

Divides a by b and returns a quotient and

remainder. For example:

>>> divide(9/5)

(1,4)

>>> divide(15,4)

(3,3)

>>>

• Many IDEs/tools look at docstrings

47

Copyright (C) 2008, http://www.dabeaz.com 3-

Docstring Benefits

• Testing

• Testing tools look at interaction session

• Example: doctest module (later)

def divide(a,b):

"""Divides a by b and returns a quotient and

remainder. For example:

>>> divide(9/5)

(1,4)

>>> divide(15,4)

(3,3)

>>>

"""

48

108

Page 113: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Exercise 3.3

49

Time : 10 minutes

Copyright (C) 2008, http://www.dabeaz.com 3-

Exceptions

• Used to signal errors

• Catching an exception (try)

• Raising an exception (raise)

if name not in names:

raise RuntimeError("Name not found")

try:

authenticate(username)

except RuntimeError,e:

print e

50

109

Page 114: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Exceptions• Exceptions propagate to first matching except

def foo():

try:

bar()

except RuntimeError,e:

...

def bar():

try:

spam()

except RuntimeError,e:

...

def spam():

grok()

def grok():

...

raise RuntimeError("Whoa!")

51

Copyright (C) 2008, http://www.dabeaz.com 3-

Builtin-Exceptions• About two-dozen built-in exceptions

ArithmeticError

AssertionError

EnvironmentError

EOFError

ImportError

IndexError

KeyboardInterrupt

KeyError

MemoryError

NameError

ReferenceError

RuntimeError

SyntaxError

SystemError

TypeError

ValueError

52

• Consult reference

110

Page 115: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Exception Values• Most exceptions have an associated value

• More information about what's wrongraise RuntimeError("Invalid user name")

• Passed to variable supplied in excepttry:

...

except RuntimeError,e:

...

• It's an instance of the exception type, but often looks like a string

except RuntimeError,e:

print "Failed : Reason", e

53

Copyright (C) 2008, http://www.dabeaz.com 3-

Catching Multiple Errors• Can catch different kinds of exceptions

try:

...

except LookupError,e:

...

except RuntimeError,e:

...

except IOError,e:

...

except KeyboardInterrupt,e:

...

• Alternatively, if handling is sametry:

...

except (IOError,LookupError,RuntimeError),e:

...

54

111

Page 116: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Catching All Errors

• Catching any exception

try:

...

except Exception:

print "An error occurred"

• Ignoring an exception (pass)try:

...

except RuntimeError:

pass

55

Copyright (C) 2008, http://www.dabeaz.com 3-

Exploding Heads

• The wrong way to use exceptions:

try:

go_do_something()

except Exception:

print "Couldn't do something"

• This swallows all possible errors that might occur.

• May make it impossible to debug if code is failing for some reason you didn't expect at all (e.g., uninstalled Python module, etc.)

56

112

Page 117: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

A Better Approach

• This is a somewhat more sane approach

try:

go_do_something()

except Exception, e:

print "Couldn't do something. Reason : %s\n" % e

57

• Reports a specific reason for the failure

• It is almost always a good idea to have some mechanism for viewing/reporting errors if you are writing code that catches all possible exceptions

Copyright (C) 2008, http://www.dabeaz.com 3-

finally statement

• Specifies code that must run regardless of whether or not an exception occurs

lock = Lock()

...

lock.acquire()

try:

...

finally:

lock.release() # release the lock

• Commonly use to properly manage resources (especially locks, files, etc.)

58

113

Page 118: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 3-

Program Exit• Program exit is handle through exceptions

raise SystemExit

raise SystemExit(exitcode)

• An alternative sometimes seenimport sys

sys.exit()

sys.exit(exitcode)

59

• Catching keyboard interrupt (Control-C)

try:

statements

except KeyboardInterrupt:

statements

Copyright (C) 2008, http://www.dabeaz.com 3-

Exercise 3.4

60

Time : 10 minutes

114

Page 119: Python Binder

Modules and LibrariesSection 4

Copyright (C) 2008, http://www.dabeaz.com 4-

Overview

• How to place code in a module

• Useful applications of modules

• Some common standard library modules

• Installing third party libraries

2

115

Page 120: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Modules

• Any Python source file is a module

• import statement loads and executes a module

# foo.py

def grok(a):

...

def spam(b):

...

import foo

a = foo.grok(2)

b = foo.spam("Hello")

...

3

Copyright (C) 2008, http://www.dabeaz.com 4-

Namespaces

• A module is a collection of named values

• It's said to be a "namespace"

• The names correspond to global variables and functions defined in the source file

• You use the module name to access

>>> import foo

>>> foo.grok(2)

>>>

4

• Module name is tied to source (foo -> foo.py)

116

Page 121: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Module Execution

• When a module is imported, all of the statements in the module execute one after another until the end of the file is reached

• If there are scripting statements that carry out tasks (printing, creating files, etc.), they will run on import

• The contents of the module namespace are all of the global names that are still defined at the end of this execution process

5

Copyright (C) 2008, http://www.dabeaz.com 4-

Globals Revisited

• Everything defined in the "global" scope is what populates the module namespace

# foo.py

x = 42

def grok(a):

...

6

• Different modules can use the same names and those names don't conflict with each other (modules are isolated)

# bar.py

x = 37

def spam(a):

...

These definitions of x are different

117

Page 122: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Running as "Main"• Source files can run in two "modes"

7

• Sometimes you want to check

if __name__ == '__main__':

print "Running as the main program"

else:

print "Imported as a module using import"

shell % python foo.py # Running as main

import foo # Loaded as a module

• __name__ is the name of the module

• The main program (and interactive interpreter) module is __main__

Copyright (C) 2008, http://www.dabeaz.com 4-

Module Loading

• Each module loads and executes once

• Repeated imports just return a reference to the previously loaded module

• sys.modules is a dict of all loaded modules

8

>>> import sys

>>> sys.modules.keys()

['copy_reg', '__main__', 'site', '__builtin__',

'encodings', 'encodings.encodings', 'posixpath', ...]

>>>

118

Page 123: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Locating Modules

• When looking for modules, Python first looks in the current working directory of the main program

• If a module can't be found there, an internal module search path is consulted

>>> import sys

>>> sys.path

['', '/Library/Frameworks/

Python.framework/Versions/2.5/lib/

python25.zip', '/Library/Frameworks/

Python.framework/Versions/2.5/lib/

python2.5', ...

]

9

Copyright (C) 2008, http://www.dabeaz.com 4-

Module Search Path

• sys.path contains search path

• Paths also added via environment variables

• Can manually adjust if you need toimport sys

sys.path.append("/project/foo/pyfiles")

% env PYTHONPATH=/project/foo/pyfiles python

Python 2.4.3 (#1, Apr 7 2006, 10:54:33)

[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin

>>> import sys

>>> sys.path

['','/project/foo/pyfiles', '/Library/Frameworks ...]

10

119

Page 124: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Import Process

• Import looks for specific kinds of filesname.pyc # Compiled Python

name.pyo # Compiled Python (optimized)

name.py # Python source file

name.dll # Dynamic link library (C/C++)

• If module is a .py file, it is first compiled into bytecode and a .pyc/.pyo file is created

• .pyc/.pyo files regenerated as needed should the original source file change

11

Copyright (C) 2008, http://www.dabeaz.com 4-

Exercise 4.1

12

Time : 15 Minutes

120

Page 125: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Modules as Objects

• When you import a module, the module itself is a kind of "object"

• You can assign it to variables, place it in lists, change it's name, and so forth

import math

m = math # Assign to a variable

x = m.sqrt(2) # Access through the variable

13

• You can even store new things in it

math.twopi = 2*math.pi

Copyright (C) 2008, http://www.dabeaz.com 4-

What is a Module?

• A module is just a thin-layer over a dictionary (which holds all of the contents)

>>> import foo

>>> foo.__dict__.keys()

['grok','spam','__builtins__','__file__',

'__name__', '__doc__']

>>> foo.__dict__['grok']

<function grok at 0x69230>

>>> foo.grok

<function grok at 0x69230>

>>>

14

• Any time you reference something in a module, it's just a dictionary lookup

121

Page 126: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

import as statement

• This is identical to import except that a different name is used for the module object

• The new name only applies to this one source file (other modules can still import the library using its original name)

• Changing the name of the loaded module

# bar.py

import fieldparse as fp

fields = fp.split(line)

15

Copyright (C) 2008, http://www.dabeaz.com 4-

import as statement• There are other practical uses of changing

the module name on import

• It makes it easy to load and seamlessly use different versions of a library module

if debug:

import foodebug as foo # Load debugging version

else:

import foo

foo.grok(2) # Uses whatever module was loaded

16

• You can make a program extensible by simply importing a plugin module as a common name

122

Page 127: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

from module import

• Selected symbols can be lifted out of a module and placed into the caller's namespace

# bar.py

from foo import grok

grok(2)

17

• This is useful if a name is used repeatedly and you want to reduce the amount of typing

• And, it also runs faster if it gets used a lot

Copyright (C) 2008, http://www.dabeaz.com 4-

from module import *• Takes all symbols from a module and places

them into the caller's namespace

# bar.py

from foo import *

grok(2)

spam("Hello")

...

18

• However, it only applies to names that don't start with an underscore (_)

• _name often used when defining non-imported values in a module.

123

Page 128: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

from module import *

• Using this form of import requires some care and attention

• It will overwrite any previously defined objects that share the same name as an imported symbol

• It may pollute the namespace with symbols that you don't really want (or expect)

19

Copyright (C) 2008, http://www.dabeaz.com 4-

Commentary

• As a general rule, it is better to make proper use of namespaces and to use the normal import statement

20

import foo

• It is more "pythonic" and reduces the amount of clutter in your code.

"Namespaces are one honking great idea -- let's do more of those!" - import this

124

Page 129: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Exercise 4.2

21

Time : 15 Minutes

Copyright (C) 2008, http://www.dabeaz.com 4-

Standard Library

• Python includes a large standard library

• Several hundred modules

• System, networking, data formats, etc.

• All accessible via import

• A quick tour of most common modules

22

125

Page 130: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Builtin Functions

• About 70 functions/objects

• Always available in global namespace

• Contained in a module __builtins__

• Have seen many of these functions already

23

Copyright (C) 2008, http://www.dabeaz.com 4-

Builtins: Math• Built-in mathematical functions

abs(x)

divmod(x,y)

max(s1, s2, ..., sn)

min(s1, s2, ..., sn)

pow(x,y [,z])

round(x, [,n])

sum(s [,initial])

• Examples>>> max(2,-10,40,7)

40

>>> min([2,-10,40,7])

-10

>>> sum([2,-10,40,7],0)

39

>>> round(3.141592654,2)

3.1400000000000001

>>>

24

126

Page 131: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Builtins: Conversions• Conversion functions and types

str(s) # Convert to string

repr(s) # String representation

int(x [,base]) # Integer

long(x [,base]) # Long

float(x) # Float

complex(x) # Complex

hex(x) # Hex string

oct(x) # Octal string

chr(n) # Character from number

ord(c) # Character code

• Examples>>> hex(42)

'0x2a'

>>> ord('c')

99

>>> chr(65)

'A'

>>>

25

Copyright (C) 2008, http://www.dabeaz.com 4-

Builtins: Iteration• Functions commonly used with for statement

enumerate(s)

range([i,] j [,stride])

reversed(s)

sorted(s [,cmp [,key [,reverse]]])

xrange([i,] j [,stride])

zip(s1,s2,...)

• Examples>>> x = [1,-10,4]

>>> for i in reversed(x): print i,

...

4 -10 1

>>> for i,v in enumerate(x): print i,v

0 1

1 -10

2 4

>>> for i in sorted(x): print i,

...

-10 1 4

26

127

Page 132: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

sys module

• Information related to environment

• Version information

• System limits

• Command line options

• Module search paths

• Standard I/O streams

27

Copyright (C) 2008, http://www.dabeaz.com 4-

sys: Command Line Opts

• Parameters passed on Python command linesys.argv

# opt.py

import sys

print sys.argv

% python opt.py -v 3 "a test" foobar.txt

['opt.py', '-v', '3', 'a test', 'foobar.txt']

%

• Example:

28

128

Page 133: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Command Line Args

• For simple programs, can manually process

29

import sys

if len(sys.argv) == 2:

filename = sys.argv[1]

out = open(filename,"w")

else:

out = sys.stdout

• For anything more complicated, processing becomes tediously annoying

• Soln: optparse module

Copyright (C) 2008, http://www.dabeaz.com 4-

optparse Module

30

• A whole framework for processing command line arguments

• Might be overkill, but implements most of the annoying code you would have to write yourself

• A relatively recent addition to Python

• Impossible to cover all details, will show example

129

Page 134: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

optparse Example

31

import optparse

p = optparse.OptionParser()

p.add_option("-d",action="store_true",dest="debugmode")

p.add_option("-o",action="store",type="string",dest="outfile")

p.add_option("--exclude",action="append",type="string",

dest="excluded")

p.set_defaults(debugmode=False,outfile=None,excluded=[])

opt, args = p.parse_args()

debugmode = opt.debugmode

outfile = opt.outfile

excluded = opt.excluded

Copyright (C) 2008, http://www.dabeaz.com 4-

sys: Standard I/O

• Standard I/O streams

sys.stdout

sys.stderr

sys.stdin

• By default, print is directed to sys.stdout

• Input read from sys.stdin

• Can redefine or use directly

sys.stdout = open("out.txt","w")

print >>sys.stderr, "Warning. Unable to connect"

32

130

Page 135: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

math module

• Contains common mathematical functionsmath.sqrt(x)

math.sin(x)

math.cos(x)

math.tan(x)

math.atan(x)

math.log(x)

...

math.pi

math.e

• Example:import math

c = 2*math.pi*math.sqrt((x1-x2)**2 + (y1-y2)**2)

33

Copyright (C) 2008, http://www.dabeaz.com 4-

random Module

• Generation of random numbersrandom.randint(a,b) # Random integer a<=x<b

random.random() # Random float 0<=x<1

random.uniform(a,b) # Random float a<=x<b

random.seed(x)

• Example:>>> import random

>>> random.randint(10,20)

16

>>> random.random()

0.53763273926379385

>>> random.uniform(10,20)

13.62148074612832

>>>

34

131

Page 136: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

copy Module• Use to make shallow/deep copies of objects

copy.copy(obj) # Make shallow copy

copy.deepcopy(obj) # Make deep copy

• Example:>>> a = [1,2,3]

>>> b = ['x',a]

>>> import copy

>>> c = copy.copy(b)

>>> c

['x', [1, 2, 3]]

>>> c[1] is a

True

>>> d = copy.deepcopy(b)

>>> d

['x', [1, 2, 3]]

>>> d[1] is a

False

a only copied by reference. b and c contain the same a.

a is also copied. b and d contain different lists

35

Copyright (C) 2008, http://www.dabeaz.com 4-

os Module

• Contains operating system functions

• Example: Executing a system command

>>> import os

>>> os.system("mkdir temp")

>>>

36

• Walking a directory tree>>> for path,dirs,files in os.walk("/home"):

... for name in files:

... print "%s/%s" % (path,name)

132

Page 137: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Environment Variables

• os.environ dictionary contains valuesimport os

home = os.environ['HOME']

os.environ['HOME'] = '/home/user/guest'

• Changes are reflected in Python and any subprocesses created later

• Environment variables (typically set in shell)% setenv NAME dave

% setenv RSH ssh

% python prog.py

37

Copyright (C) 2008, http://www.dabeaz.com 4-

Getting a Directory Listing

• os.listdir() function

38

>>> files = os.listdir("/some/path")

>>> files

['foo','bar','spam']

>>>

• glob module

>>> txtfiles = glob.glob("*.txt")

>>> datfiles = glob.glob("Dat[0-5]*")

>>>

• glob understands Unix shell wildcards (on all systems)

133

Page 138: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

os.path Module

• Portable management of path names and files

• Examples:

>>> import os.path

>>> os.path.basename("/home/foo/bar.txt")

'bar.txt'

>>> os.path.dirname("/home/foo/bar.txt")

'/home/foo'

>>> os.path.join("home","foo","bar.txt")

'home/foo/bar.txt'

>>>

39

Copyright (C) 2008, http://www.dabeaz.com 4-

File Tests

• Testing if a filename is a regular file

40

>>> os.path.isfile("foo.txt")

True

>>> os.path.isfile("/usr")

False

>>>

• Testing if a filename is a directory>>> os.path.isdir("foo.txt")

False

>>> os.path.isdir("/usr")

True

>>>

• Testing if a file exists>>> os.path.exists("foo.txt")

True

>>>

134

Page 139: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

File Metadata

• Getting the last modification/access time

41

>>> os.path.getmtime("foo.txt")

1175769416.0

>>> os.path.getatime("foo.txt")

1175769491.0

>>>

• Note: To decode times, use time module

>>> time.ctime(os.path.getmtime("foo.txt"))

'Thu Apr 5 05:36:56 2007'

>>>

• Getting the file size>>> os.path.getsize("foo.txt")

1344L

>>>

Copyright (C) 2008, http://www.dabeaz.com 4-

Shell Operations (shutil)

• Copying a file

42

>>> shutil.copy("source","dest")

• Moving a file (renaming)

>>> shutil.move("old","new")

• Copying a directory tree>>> shutil.copytree("srcdir","destdir")

• Removing a directory tree

>>> shutil.rmtree("dir")

135

Page 140: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Exercise 4.3

43

Time : 10 Minutes

Copyright (C) 2008, http://www.dabeaz.com 4-

pickle Module• A module for serializing objects

44

• Saving an object to a fileimport pickle

...

f = open("data","wb")

pickle.dump(someobj,f)

• Loading an object from a file

f = open("data","rb")

someobj = pickle.load(f)

• This works with almost all objects except for those involving system state (open files, network connections, threads, etc.)

136

Page 141: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

pickle Module

• Pickle can also turn objects into strings

import pickle

# Convert to a string

s = pickle.dumps(someobj)

...

# Load from a string

someobj = pickle.loads(s)

• Useful if you want to send an object elsewhere

• Examples: Send over network, store in a database, etc.

45

Copyright (C) 2008, http://www.dabeaz.com 4-

shelve module• Used to create a persistent dictionary

import shelve

s = shelve.open("data","c")

# Put an object in the shelf

s['foo'] = someobj

# Get an object out of the shelf

someobj = s['foo']

s.close()

• Supports many dictionary operationss[key] = obj # Store an object

data = s[key] # Get an object

del s[key] # Delete an object

s.has_key(key) # Test for key

s.keys() # Return list of keys

46

137

Page 142: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

shelve module

• Keys must be strings

s = shelve.open("data","w")

s['foo'] = 4 # okay

s[2] = 'Hello' # Error. bad key

• Shelve open() flagss = shelve.open("file",'c') # RW. Create if not exist

s = shelve.open("file",'r') # Read-only

s = shelve.open("file",'w') # Read-write

s = shelve.open("file",'n') # RW. Force creation

• Stored values may be any object compatible with the pickle module

47

Copyright (C) 2008, http://www.dabeaz.com 4-

ConfigParser Module• A module that can read parameters out of

configuration files based on the .INI format

48

# A Comment

; A Comment (alternative)

[section1]

var1 = 123

var2 = abc

[section2]

var1 = 456

var2 = def

...

• File consists of named sections each with a series of variable declarations

138

Page 143: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Parsing INI Files

• Creating and parsing a configuration file

49

import ConfigParser

cfg = ConfigParser.ConfigParser()

cfg.read("config.ini")

• Retrieving the value of an option

v = cfg.get("section","varname")

• This returns the value as a string or raises an exception if not found.

Copyright (C) 2008, http://www.dabeaz.com 4-

Parsing Example• Sample .INI file

50

; simple.ini

[files]

infile = stocklog.dat

outfile = summary.dat

[web]

host = www.python.org

port = 80

• Interactive session>>> cfg = ConfigParser.ConfigParser()

>>> cfg.read("simple.ini")

['simple.ini']

>>> cfg.get("files","infile")

'stocklog.dat'

>>> cfg.get("web","host")

'www.python.org'

>>>

139

Page 144: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Tkinter• A Library for building simple GUIs

>>> from Tkinter import *

>>> def pressed():

... print "You did it!"

>>> b = Button(text="Do it!",command=pressed)

>>> b.pack()

>>> b.mainloop()

51

• Clicking on the button....

You did it!

You did it!

...

Copyright (C) 2008, http://www.dabeaz.com 4-

More on Tkinter

• The only GUI packaged with Python itself

• Based on Tcl/Tk. Popular open-source scripting language/GUI widget set developed by John Ousterhout (90s)

• Tk used in a wide variety of other languages (Perl, Ruby, etc.)

• Cross-platform (Unix/Windows/MacOS)

• It's small (~25 basic widgets)

52

140

Page 145: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Sample Tkinter Widgets

53

Copyright (C) 2008, http://www.dabeaz.com 4-

Tkinter Programming

• GUI programming is a big topic

• Frankly, it's mostly a matter of looking at the docs (google 'Tkinter' for details)

• Execution model is based entirely on an event loop with callback functions

>>> from Tkinter import *

>>> def pressed():

... print "You did it!"

>>> b = Button(text="Do it!",command=pressed)

>>> b.pack()

>>> b.mainloop()

54

Callback

Run the event loop

141

Page 146: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Tkinter Commentary

• Your mileage with Tkinter will vary

• It comes with Python, but it's also dated

• And it's a lot lower-level than many like

• Some other GUI toolkits:

• wxPython

• PyQT

55

Copyright (C) 2008, http://www.dabeaz.com 4-

Exercise 4.4

56

Time : 30 Minutes

142

Page 147: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Useful New Datatypes

• The standard library also contains new kinds of datatypes useful for certain kinds of problems

• Examples:

• Decimal

• datetime

• deque

• Will briefly illustrate

57

Copyright (C) 2008, http://www.dabeaz.com 4-

decimal module• A module that implements accurate base-10

decimals of arbitrary precision

58

>>> from decimal import Decimal

>>> x = Decimal('3.14')

>>> y = Decimal('5.2')

>>> x + y

Decimal('8.34')

>>> x / y

Decimal('0.6038461538461538461538461538')

>>>

• Controlling the precision>>> from decimal import getcontext

>>> getcontext().prec = 2

>>> x / y

Decimal('0.60')

>>>

143

Page 148: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

datetime module

• A module for representing and manipulating dates and times

59

>>> from datetime import datetime

>>> inauguration = datetime(2009,1,20)

>>> inauguration

datetime.datetime(2009, 1, 20, 0, 0)

>>> today = datetime.today()

>>> d = inauguration - today

>>> d

datetime.timedelta(92, 54189, 941608)

>>> d.days

92

>>>

• There are many more features not shown

Copyright (C) 2008, http://www.dabeaz.com 4-

collections module

• A module with a few useful data structures

• collections.deque - A doubled ended queue

60

>>> from collections import deque

>>> q = deque()

>>> q.append("this")

>>> q.appendleft("that")

>>> q

deque(['that', 'this'])

>>>

• A deque is like a list except that it's highly optimized for insertion/deletion on both ends

• Implemented in C and finely tuned

144

Page 149: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Standardized APIs

• API (Application Programming Interface)

• Some parts of the Python library are focused on providing a common programming interface for accessing third-party systems

• Example : The Python Database API

• A standardized way to access relational database systems (Oracle, Sybase, MySQL, etc.)

61

Copyright (C) 2008, http://www.dabeaz.com 4-

Database Example• Connecting to a DB, executing a query

62

from somedbmodule import connect

connection = connect("somedb",

user="dave",password="12345")

cursor = connection.cursor()

cursor.execute("select name,address from people")

for row in cursor:

print row

connection.close()

• Output('Elwood','1060 W Addison')

('Jack','4402 N Broadway')

...

• It's actually not much more than this...

145

Page 150: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Forming Queries

• There is one tricky bit concerning the formation of query strings

select address from people where name='Elwood'

63

• As a general rule, you should not use Python string operators to create queries

query = """select address from people where

name='%s'""" % (name,)

• Leaves code open to an SQL injection attack

Copyright (C) 2008, http://www.dabeaz.com 4-

Forming Queries

select address from people where name='Elwood'

64

(http://xkcd.com/327/)

146

Page 151: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Value Substitutions• DB modules also provide their own

mechanism for substituting values in queries

65

cur.execute("select address from people where name=?",

(name,))

cur.execute("select address from people where name=?

and state=?",(name,state))

• This is somewhat similar to string formatting

• Unfortunately, the exact mechanism varies slightly by DB module (must consult docs)

Copyright (C) 2008, http://www.dabeaz.com 4-

Exercise 4.5

66

147

Page 152: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Third Party Modules

• Python has a large library of built-in modules ("batteries included")

• There are even more third party modules

• Python Package Index (PyPi)

67

http://pypi.python.org/

• Google(Fill in with whatever you're looking for)

Copyright (C) 2008, http://www.dabeaz.com 4-

Installing Modules

• Installation of a module is likely to take three different forms (depends on the module)

• Platform-native installer

• Manual Installation

• Python Eggs

68

148

Page 153: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Platform Native Install

• You downloaded a third party module as an .exe (PC) or .dmg (Mac) file

• Just run this file and follow installation instructions like you do for installing other software

• You are only likely to see these installers for the more major third-party extensions

69

Copyright (C) 2008, http://www.dabeaz.com 4-

Manual Installation

• You downloaded a Python module or package using a standard file format such as a .gz, .tar.gz, .tgz, .bz2, or .zip file

• Unpack the file and look for a setup.py file in the resulting folder

• Run Python on that setup.py file

70

% python setup.py install

Installation messages

...

%

149

Page 154: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Python Eggs• An emerging packaging format for Python

modules based on setuptools

• Currently requires a third-party install

71

http://pypi.python.org/pypi/setuptools

• Adds a command-line tool easy_install

% easy_install packagename

(or)

% easy_install packagename-1.2.3-py2.5.egg

• In theory, this is just "automatic"

Copyright (C) 2008, http://www.dabeaz.com 4-

Commentary

• Installing third party modules is always a delicate matter

• More advanced modules may involve C/C++ code which has to be compiled to native code on your platform.

• May have dependencies on other modules

• In theory, Eggs are supposed to solve these problems.... in theory.

72

150

Page 155: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 4-

Summary

• Have looked at module/package mechanism

• Some of the very basic built-in modules

• How to install third party modules

• We will focus on more of the built-in modules later in the course

73

151

Page 156: Python Binder

Classes and Objects

Section 5

Copyright (C) 2008, http://www.dabeaz.com 5-

Overview

• How to define new objects

• How to customize objects (inheritance)

• How to combine objects (composition)

• Python special methods and customization features

2

152

Page 157: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

OO in a Nutshell

• A programming technique where code is organized as a collection of "objects"

• An "object" consists of

• Data (attributes)

• Methods (functions applied to object)

• Example: A "Circle" object

• Data: radius

• Methods: area(), perimeter()

3

Copyright (C) 2008, http://www.dabeaz.com 5-

The class statement

• A definition for a new object

class Circle(object): def __init__(self,radius): self.radius = radius def area(self): return math.pi*(self.radius**2) def perimeter(self): return 2*math.pi*self.radius

• What is a class?

• It's a collection of functions that perform various operations on "instances"

4

153

Page 158: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Instances• Created by calling the class as a function

• Each instance has its own data

>>> c.area()50.26548245743669>>> d.perimeter()31.415926535897931>>>

5

>>> c = Circle(4.0)>>> d = Circle(5.0)>>>

>>> c.radius4.0>>> d.radius5.0>>>

• You invoke methods on instances to do things

Copyright (C) 2008, http://www.dabeaz.com 5-

__init__ method

• This method initializes a new instance

• Called whenever a new object is created>>> c = Circle(4.0)

class Circle(object): def __init__(self,radius): self.radius = radius

newly created object

• __init__ is example of a "special method"

• Has special meaning to Python interpreter

6

154

Page 159: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Instance Data

• Each instance has its own data (attributes)class Circle(object): def __init__(self,radius): self.radius = radius

• In other code, you just use the variable that you're using to name the instance

• Inside methods, you refer to this data using selfdef area(self): return math.pi*(self.radius**2)

>>> c = Circle(4.0)>>> c.radius4.0

7

Copyright (C) 2008, http://www.dabeaz.com 5-

Methods• Functions applied to instances of an object

class Circle(object): ... def area(self): return math.pi*self.radius**2

• By convention, the instance is called "self"

• The object is always passed as first argument

>>> c.area()

def area(self): ...

The name is unimportant---the object is always passed as the first argument. It is simply Python programming style to call this argument "self." C++ programmers might prefer to call it "this."

8

155

Page 160: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Calling Other Methods

• Methods call other methods via self

class Circle(object): def area(self): return math.pi*self.radius**2 def print_area(self): print self.area()

9

• A caution : Code like this doesn't work

class Circle(object): ... def print_area(self): print area() # ! Error

• This merely calls a global function area()

Copyright (C) 2008, http://www.dabeaz.com 5-

Exercise 5.1

10

Time : 15 Minutes

156

Page 161: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Inheritance

• A tool for specializing objects

class Parent(object): ...

class Child(Parent): ...

• New class called a derived class or subclass

• Parent known as base class or superclass

• Parent is specified in () after class name

11

Copyright (C) 2008, http://www.dabeaz.com 5-

Inheritance

• What do you mean by "specialize?"

• Take an existing class and ...

• Add new methods

• Redefine some of the existing methods

• Add new attributes to instances

12

157

Page 162: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Inheritance Example

• In bill #246 of the 1897 Indiana General Assembly, there was text that dictated a new method for squaring a circle, which if adopted, would have equated ! to 3.2.

• Fortunately, it was never adopted because an observant mathematician took notice...

• But, let's make a special Indiana Circle anyways...

13

Copyright (C) 2008, http://www.dabeaz.com 5-

Inheritance Example• Specializing a class

class INCircle(Circle): def area(self): return 3.2*self.radius**2

• Using the specialized version>>> c = INCircle(4.0) # Calls Circle.__init__>>> c.radius4.0>>> c.area() # Calls INCircle.area51.20>>> c.perimeter() # Calls Circle.perimeter25.132741228718345>>>

14

• It's the same as Circle except for area()

158

Page 163: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Using Inheritance

• Inheritance often used to organize objects

class Shape(object): ...

class Circle(Shape): ...

class Rectangle(Shape): ...

• Think of a logical hierarchy or taxonomy

15

Copyright (C) 2008, http://www.dabeaz.com 5-

object base class

• If a class has no parent, use object as base

class Foo(object): ...

• object is the parent of all objects in Python

• Note : Sometimes you will see code where classes are defined without any base class. That is an older style of Python coding that has been deprecated for almost 10 years. When defining a new class, you always inherit from something.

16

159

Page 164: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Inheritance and __init__

• With inheritance, you must initialize parents

class Shape(object): def __init__(self): self.x = 0.0 self.y = 0.0 ...class Circle(Shape): def __init__(self,radius): Shape.__init__(self) # init base

self.radius = radius

17

Copyright (C) 2008, http://www.dabeaz.com 5-

Inheritance and methods

• Calling the same method in a parent

class Foo(object): def spam(self): ... ...class Bar(Foo): def spam(self): ... r = Foo.spam(self)

...

18

• Useful if subclass only makes slight change to method in base class (or wraps it)

160

Page 165: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Calling Other Methods

• With inheritance, the correct method gets called if overridden (depends on the type of self)

class Circle(object): def area(self): return math.pi*self.radius**2 def print_area(self): print self.area()

class INCircle(Circle): def area(self): return 3.2*self.radius**2

19

if self is an instance of INCircle

• Example:>>> c = INCircle(4)>>> c.print_area()51.2>>>

Copyright (C) 2008, http://www.dabeaz.com 5-

Multiple Inheritance

• You can specifying multiple base classes

class Foo(object): ...class Bar(object): ...class Spam(Foo,Bar): ...

• The new class inherits features from both parents

• But there are some really tricky details (later)

• Rule of thumb : Avoid multiple inheritance

20

161

Page 166: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Exercise 5.2

21

Time : 30 Minutes

Copyright (C) 2008, http://www.dabeaz.com 5-

Special Methods

• Classes may define special methods

• Have special meaning to Python interpreter

• Always preceded/followed by __class Foo(object): def __init__(self): ... def __del__(self): ...

• There are several dozen special methods

• Will show a few examples

22

162

Page 167: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Methods: String Conv.• Converting object into string representation

str(x) __str__(x)repr(x) __repr__(x)

class Foo(object): def __str__(self): s = "some string for Foo" return s def __repr__(self): s = "Foo(args)"

• __str__ used by print statement

• __repr__ used by repr() and interactive mode

Note: The convention for __repr__() is to return a string that, when fed to eval() , will recreate the underlying object. If this is not possible, some kind of easily readable representation is used instead.

23

Copyright (C) 2008, http://www.dabeaz.com 5-

Methods: Item Access

• Methods used to implement containerslen(x) __len__(x)x[a] __getitem__(x,a)x[a] = v __setitem__(x,a,v)del x[a] __delitem__(x,a)

• Use in a classclass Foo(object): def __len__(self): ... def __getitem__(self,a): ... def __setitem__(self,a,v): ... def __delitem__(self,a): ...

24

163

Page 168: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Methods: Containment

• Containment operatorsa in x __contains__(x,a)b not in x not __contains__(x,b)

• Example:class Foo(object): def __contains__(self,a): # if a is in self, return True # otherwise, return False)

25

Copyright (C) 2008, http://www.dabeaz.com 5-

Methods: Mathematics• Mathematical operators

a + b __add__(a,b)a - b __sub__(a,b)a * b __mul__(a,b)a / b __div__(a,b)a // b __floordiv__(a,b)a % b __mod__(a,b)a << b __lshift__(a,b)a >> b __rshift__(a,b)a & b __and__(a,b)a | b __or__(a,b)a ^ b __xor__(a,b)a ** b __pow__(a,b)-a __neg__(a)~a __invert__(a)abs(a) __abs__(a)

• Consult reference for further details

26

164

Page 169: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Odds and Ends

• Defining new exceptions

• Bound and unbound methods

• Alternative attribute lookup

27

Copyright (C) 2008, http://www.dabeaz.com 5-

Defining Exceptions

• User-defined exceptions are defined by classes

• Exceptions always inherit from Exception

class NetworkError(Exception): pass

• For simple exceptions, you can just make an empty class as shown above

• If you are storing special attributes in the exception, there are additional details (consult a reference)

28

165

Page 170: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Method Invocation

• Invoking a method is a two-step process

• Lookup: The . operator

• Method call: The () operator

class Foo(object): def bar(self,x): ...

>>> f = Foo()>>> b = f.bar>>> b<bound method Foo.bar of <__main__.Foo object at 0x590d0>>>>> b(2)

29

Lookup

Method call

Copyright (C) 2008, http://www.dabeaz.com 5-

Unbound Methods

• Methods can be accessed directly through the class

class Foo(object): def bar(self,a): ...>>> Foo.bar<unbound method Foo.bar>

• To use it, you just have to supply an instance

>>> f = Foo()>>> Foo.bar(f,2)

30

166

Page 171: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Attribute Accesss• These functions may be used to manipulate

attributes given an attribute name stringgetattr(obj,"name") # Same as obj.namesetattr(obj,"name",value) # Same as obj.name = valuedelattr(obj,"name") # Same as del obj.namehasattr(obj,"name") # Tests if attribute exists

• Example: Probing for an optional attributeif hasattr(obj,"x"): x = getattr(obj,"x"):else: x = None

• Note: getattr() has a useful default value arg

x = getattr(obj,"x",None)

31

Copyright (C) 2008, http://www.dabeaz.com 5-

Summary

• A high-level overview of classes

• How to define classes

• How to define methods

• Creating and using objects

• Special python methods

• Overloading

32

167

Page 172: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 5-

Exercise 5.3

33

Time : 15 Minutes

168

Page 173: Python Binder

Inside the Python Object Model

Section 6

Copyright (C) 2008, http://www.dabeaz.com 6-

Overview

• A few more details about how objects work

• How objects are represented

• Details of attribute access

• Data encapsulation

• Memory management

2

169

Page 174: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Dictionaries Revisited

• A dictionary is a collection of named values

3

stock = {

'name' : 'GOOG',

'shares' : 100,

'price' : 490.10

}

• Dictionaries are commonly used for simple data structures (shown above)

• However, they are used for critical parts of the interpreter and may be the most important type of data in Python

Copyright (C) 2008, http://www.dabeaz.com 6-

Dicts and Functions• When a function executes, a dictionary

holds all of the local variables

4

def read_prices(filename):

prices = { }

for line in open(filename):

fields = line.split()

prices[fields[0]] = float(fields[1])

return prices

{

'filename' : 'prices.dat',

'line' : 'GOOG 523.12',

'prices' : { ... },

'fields' : ['GOOG', '523.12']

}

locals()

170

Page 175: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Dicts and Modules• In a module, a dictionary holds all of the

global variables and functions

5

# foo.py

x = 42

def bar():

...

def spam():

...

{

'x' : 42,

'bar' : <function bar>,

'spam' : <function spam>

}

foo.__dict__ or globals()

Copyright (C) 2008, http://www.dabeaz.com 6-

Dicts and Objects

• User-defined objects also use dictionaries

• Instance data

• Class members

• In fact, the entire object system is mostly just an extra layer that's put on top of dictionaries

• Let's take a look...

6

171

Page 176: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Dicts and Instances

• A dictionary holds instance data (__dict__)>>> s = Stock('GOOG',100,490.10)

>>> s.__dict__

{'name' : 'GOOG','shares' : 100, 'price': 490.10 }

• You populate this dict when assigning to selfclass Stock(object):

def __init__(self,name,shares,price):

self.name = name

self.shares = shares

self.price = price

self.__dict__

{

'name' : 'GOOG',

'shares' : 100,

'price' : 490.10

}

instance data 7

Copyright (C) 2008, http://www.dabeaz.com 6-

Dicts and Instances

• Critical point : Each instance gets its own private dictionary

s = Stock('GOOG',100,490.10)

t = Stock('AAPL',50,123.45)

8

{

'name' : 'GOOG',

'shares' : 100,

'price' : 490.10

}

{

'name' : 'AAPL',

'shares' : 50,

'price' : 123.45

}

• So, if you created 100 instances of some class, there are 100 dictionaries sitting around holding data

172

Page 177: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Dicts and Classes• A dictionary holds the members of a class

class Stock(object):

def __init__(self,name,shares,price):

self.name = name

self.shares = shares

self.price = price

def cost(self):

return self.shares*self.price

def sell(self,nshares):

self.shares -= nshares

9

Stock.__dict__

{

'cost' : <function>,

'sell' : <function>,

'__init__' : <function>,

}

methods

Copyright (C) 2008, http://www.dabeaz.com 6-

Instances and Classes

• Instances and classes are linked together

>>> s = Stock('GOOG',100,490.10)

>>> s.__dict__

{'name':'GOOG','shares':100,'price':490.10 }

>>> s.__class__

<class '__main__.Stock'>

>>>

• __class__ attribute refers back to the class

10

• The instance dictionary holds data unique to each instance whereas the class dictionary holds data collectively shared by all instances

173

Page 178: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Instances and Classes

.__dict__ {attrs}

.__class__

.__dict__ {attrs}

.__class__

.__dict__ {attrs}

.__class__

.__dict__ {methods}

instances

class

11

Copyright (C) 2008, http://www.dabeaz.com 6-

Attribute Access

• When you work with objects, you access data and methods using the (.) operator

12

• These operations are directly tied to the dictionaries sitting underneath the covers

x = obj.name # Getting

obj.name = value # Setting

del obj.name # Deleting

174

Page 179: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Modifying Instances

• Operations that modify an object always update the underlying dictionary

>>> s = Stock('GOOG',100,490.10)

>>> s.__dict__

{'name':'GOOG', 'shares':100, 'price':490.10 }

>>> s.shares = 50

>>> s.date = "6/7/2007"

>>> s.__dict__

{ 'name':'GOOG', 'shares':50, 'price':490.10,

'date':'6/7/2007'}

>>> del s.shares

>>> s.__dict__

{ 'name':'GOOG', 'price':490.10, 'date':'6/7/2007'}

>>>

13

Copyright (C) 2008, http://www.dabeaz.com 6-

A Caution• Setting an instance attribute will silently

override class members with the same name

>>> s = Stock('GOOG',100,490.10)

>>> s.cost()

49010.0

>>> s.cost = "a lot"

>>> s.cost()

Traceback (most recent call last):

File "<stdin>", line 1, in ?

TypeError: 'str' object is not callable

>>> s.cost

'a lot'

>>>

• The value in the instance dictionary is hiding the one in the class dictionary

14

175

Page 180: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Modifying Instances

• It may be surprising that instances can be extended after creation

• You can freely change attributes at any time

>>> s = Stock('GOOG',100,490.10)

>>> s.blah = "some new attribute"

>>> del s.name

>>>

• Again, you're just manipulating a dictionary

• Very different from C++/Java where the structure of an object is rigidly fixed

15

Copyright (C) 2008, http://www.dabeaz.com 6-

Reading Attributes

• Attribute may exist in two places

• Local instance dictionary

• Class dictionary

• So, both dictionaries may be checked

16

• Suppose you read an attribute on an instance

x = obj.name

176

Page 181: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Reading Attributes

• First check in local __dict__

• If not found, look in __dict__ of class

>>> s = Stock(...)

>>> s.name

'GOOG'

>>> s.cost()

49010.0

>>>

s .__dict__

.__class__{'name': 'GOOG',

'shares': 100 }

Stock.__dict__ {'cost': <func>,

'sell':<func>,

'__init__':..}

1

2

17

• This lookup scheme is how the members of a class get shared by all instances

Copyright (C) 2008, http://www.dabeaz.com 6-

Exercise 6.1

18

Time : 10 Minutes

177

Page 182: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

How Inheritance Works

class A(B,C):

...

• Classes may inherit from other classes

• Bases are stored as a tuple in each class>>> A.__bases__

(<class '__main__.B'>,<class '__main__.C'>)

>>>

19

• This provides a link to parent classes

• This link simply extends the search process used to find attributes

Copyright (C) 2008, http://www.dabeaz.com 6-

Reading Attributes

• First check in local __dict__

• If not found, look in __dict__ of class

>>> s = Stock(...)

>>> s.name

'GOOG'

>>> s.cost()

49010.0

>>>

s .__dict__

.__class__{'name': 'GOOG',

'shares': 100 }

Stock.__dict__ {'cost': <func>,

'sell':<func>,

'__init__':..}

• If not found in class, look in base classes

.__bases__

look in __bases__

1

2

3

20

178

Page 183: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Single Inheritance• In inheritance hierarchies, attributes are

found by walking up the inheritance tree

21

class A(object): pass

class B(A): pass

class C(A): pass

class D(B): pass

class E(D): pass

object

A

B C

D

E

e = E()

e.attre instance

• With single inheritance, there is a single path to the top

• You stop with the first match

Copyright (C) 2008, http://www.dabeaz.com 6-

Multiple Inheritance

class A(object): pass

class B(object): pass

class C(A,B): pass

class D(B): pass

class E(C,D): pass

• Consider this hierarchyobject

A B

C D

E• What happens here?e = E()

e.attr

22

• A similar search process is carried out, but there is an added complication in that there may be many possible search paths

179

Page 184: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Multiple Inheritance• For multiple inheritance, Python determines

a "method resolution order" that sets the order in which base classes get checked

>>> E.__mro__

(<class '__main__.E'>, <class '__main__.C'>,

<class '__main__.A'>, <class '__main__.D'>,

<class '__main__.B'>, <type 'object'>)

>>>

• The MRO is described in a 20 page math paper (C3 Linearization Algorithm)

• Note: This complexity is one reason why multiple inheritance is often avoided

23

Copyright (C) 2008, http://www.dabeaz.com 6-

Classes and Encapsulation

• One of the primary roles of a class is to encapsulate data and internal implementation details of an object

• However, a class also defines a "public" interface that the outside world is supposed to use to manipulate the object

• This distinction between implementation details and the public interface is important

24

180

Page 185: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

A Problem• In Python, almost everything about classes

and objects is "open"

• You can easily inspect object internals

• You can change things at will

• There's no strong notion of access-control (i.e., private class members)

• If you're trying to cleanly separate the internal "implementation" from the "interface" this becomes an issue

25

Copyright (C) 2008, http://www.dabeaz.com 6-

Python Encapsulation

• Python relies on programming conventions to indicate the intended use of something

• Typically, this is based on naming

• There is a general attitude that it is up to the programmer to observe the rules as opposed to having the language enforce rules

26

181

Page 186: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Private Attributes

• Any attribute name with leading __ is "private"

class Foo(object):

def __init__(self):

self.__x = 0

• Example>>> f = Foo()

>>> f.__x

AttributeError: 'Foo' object has no attribute '__x'

>>>

• This is actually just a name mangling trick>>> f = Foo()

>>> f._Foo__x

0

>>>

27

Copyright (C) 2008, http://www.dabeaz.com 6-

Private Methods

• Private naming also applies to methods

28

class Foo(object):

def __spam(self):

print "Foo.__spam"

def callspam(self):

self.__spam() # Uses Foo.__spam

• Example:>>> f = Foo()

>>> f.callspam()

Foo.__spam

>>> f.__spam()

AttributeError: 'Foo' object has no attribute '__spam'

>>> f._Foo__spam()

Foo.__spam

>>>

182

Page 187: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Performance Commentary

• Python name mangling in classes occurs at the time a class is defined, not at run time

• Thus, there is no performance overhead for using this feature

• It's actually a somewhat ingenious approach since it avoids all runtime checks for "public" vs. "private" (so the interpreter runs faster overall)

29

Copyright (C) 2008, http://www.dabeaz.com 6-

Accessor Methods• Consider controlling access to internals

using method calls.

30

class Foo(object):

def __init__(self,name):

self.__name = name

def getName(self):

return self.__name

def setName(self,name):

if not isinstance(name,str):

raise TypeError("Expected a string")

self.__name = name

• Methods give you more flexibility and may become useful as your program grows

• May make it easier to integrate the object into a larger framework of objects

183

Page 188: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Properties• Accessor methods can optionally be turned

into "property" attributes:

31

class Foo(object):

def __init__(self,name):

self.__name = name

def getName(self):

return self.__name

def setName(self,name):

if not isinstance(name,str):

raise TypeError("Expected a string")

self.__name = name

name = property(getName,setName)

• Properties look like normal attributes, but implicitly call the accessor methods.

Copyright (C) 2008, http://www.dabeaz.com 6-

Properties• Example use:

>>> f = Foo("Elwood")

>>> f.name # Calls f.getName()

'Elwood'

>>> f.name = 'Jake' # Calls f.setName('Jake')

>>> f.name = 45 # Calls f.setName(45)

TypeError: Expected a string

>>>

32

• Comment : With properties, you can hide extra processing behind access to data attributes--something that is useful in certain settings

• Example : Type checking

184

Page 189: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Properties• Properties are also useful if you are creating

objects where you want to have a very consistent user interface

• Example : Computed data attributes

33

class Circle(object):

def __init__(self,radius):

self.radius = name

def area(self):

return math.pi*self.radius**2

area = property(area)

def perimeter(self):

return 2*math.pi*self.radius

perimeter = property(perimeter)

Copyright (C) 2008, http://www.dabeaz.com 6-

Properties

• Example use:

34

>>> c = Circle(4)

>>> c.radius

4

>>> c.area

50.26548245743669

>>> c.perimeter

25.132741228718345

• Commentary : Notice how there is no obvious difference between the attributes as seen by the user of the object

Instance Variable

Computed Properties

185

Page 190: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Uniform Access

• The last example shows how to put a more uniform interface on an object. If you don't do this, an object might be confusing to use:

35

>>> c = Circle(4.0)

>>> a = c.area() # Method

>>> r = c.radius # Data attribute

>>>

• Why is the () required for the area, but not for the radius?

Copyright (C) 2008, http://www.dabeaz.com 6-

Properties• It is very common for a property to replace

a method of the same name

36

class Circle(object):

...

def area(self):

return math.pi*self.radius**2

area = property(area)

def perimeter(self):

return 2*math.pi*self.radius

perimeter = property(perimeter)

Notice the identical names

• When you define a property, you usually don't want the user to explicitly call the method

• So, this trick hides the method and only exposes the property attribute

186

Page 191: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Decorators• There is an alternative way to make properties

37

class Circle(object):

...

def area(self):

return math.pi*self.radius**2

area = property(area)

...

• Use a "decorator"class Circle(object):

...

@property

def area(self):

return math.pi*self.radius**2

...

• An advanced topic, but a "decorator" is basically a modifier applied to functions

Copyright (C) 2008, http://www.dabeaz.com 6-

__slots__ Attribute

• You can restrict the set of attribute namesclass Foo(object):

__slots__ = ['x','y']

...

• Produces errors for other attributes>>> f = Foo()

>>> f.x = 3

>>> f.y = 20

>>> f.z = 1

Traceback (most recent call last):

File "<stdin>", line 1, in ?

AttributeError: 'Foo' object has no attribute 'z'

• Prevents errors, restricts usage of objects

38

187

Page 192: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Exercise 6.2

39

Time : 10 Minutes

Copyright (C) 2008, http://www.dabeaz.com 6-

Memory Management

• Python has automatic memory management and garbage collection

• As a general rule, you do not have to worry about cleaning up when you define your own classes

• However, there are a couple of additional details to be aware of

40

188

Page 193: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Instance Creation

• Instances are actually created in two steps

>>> a = Stock('GOOG',100,490.10)

• __new__() constructs the raw instance

• __init__() initializes it

a = Stock.__new__(Stock,'GOOG',100,490.10)

a.__init__('GOOG',100,490.10)

• Performs these steps

41

Copyright (C) 2008, http://www.dabeaz.com 6-

__new__() method

• Creates a new (empty) object

• If you see this defined in a class, it almost always indicates the presence of heavy wizardry

• Inheritance from an immutable type

• Definition of a metaclass

• Both topics are beyond the scope of this class

• Bottom line : You don't see this in "normal" code

42

189

Page 194: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Instance Deletion

• All objects are reference counted

• Destroyed when refcount=0

• When an instance is destroyed, the reference count of all instance data is also decreased

• Normally, all of this just "works" and you don't worry about it

43

Copyright (C) 2008, http://www.dabeaz.com 6-

Object Deletion: Cycles

• There are some issues with cyclesclass A(object): pass

class B(object): pass

a = A()

b = B()

a.b = b

b.a = a

a b

ref=2 ref=2

.b .a

ref=1 ref=1

.b .a• Objects live, but no way to access

• Deletiondel a

del b

44

190

Page 195: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

Garbage Collection

• An extra garbage collector looks for cycles

• It runs automatically, but you can control it

import gc

gc.collect() # Run full garbage collection now

gc.disable() # Disable garbage collection

gc.enable() # Enable garbage collection

• gc also has some diagnostic functions

45

Copyright (C) 2008, http://www.dabeaz.com 6-

__del__ method

• Classes may define a destructor methodclass Stock(object):

...

def __del__(self):

# Cleanup

• Called when the reference count reaches 0

• A common confusion : __del__() is not necessarily triggered by the del operator.

46

s = Stock('GOOG',100,490.10)

t = s

del s # Does not call s.__del__()

t = 42 # Calls s.__del__() (refcount=0)

191

Page 196: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 6-

__del__ method

• Don't define __del__ without a good reason

• Typical uses:

• Proper shutdown of system resources (e.g., network connections)

• Releasing locks (e.g., threading)

• Avoid defining it for any other purpose

47

Copyright (C) 2008, http://www.dabeaz.com 6-

Exercise 6.3

48

Optional

192

Page 197: Python Binder

Documentation, Testing, and Debugging

Section 7

Copyright (C) 2008, http://www.dabeaz.com 7-

Overview

• Documenting programs

• Testing (doctest, unittest)

• Error handling/diagnostics

• Debugging

• Profiling

2

193

Page 198: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Documentation

• Python has both comments and docstrings

# Compute the greatest common divisor. Uses clever

# tuple hack to avoid an extra temporary variable.

def gcd(x,y):

"""Compute the greatest common divisor of x and y.

For example:

>>> gcd(40,16)

8

>>>

"""

while x > 0: x,y = y%x,x

return y

3

Copyright (C) 2008, http://www.dabeaz.com 7-

Documentation

• Comments should be used to provide notes to developers reading the source code

• Doc strings should be used to provide documentation to end-users.

• Emphasize: Documentation strings are considered good Python style.

4

194

Page 199: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Uses of Docstrings

• Most Python IDEs use doc strings to provide user information and help

5

Copyright (C) 2008, http://www.dabeaz.com 7-

Uses of Docstrings

• Online help

6

195

Page 200: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Testing: doctest module• A module that runs tests from docstrings

• Look at docstrings for interactive sessions

# Compute the greatest common divisor. Uses clever

# tuple hack to avoid an extra temporary variable.

def gcd(x,y):

"""Compute the greatest common divisor of x and y.

For example:

>>> gcd(40,16)

8

>>>

"""

while x > 0: x,y = y%x,x

return y

7

Copyright (C) 2008, http://www.dabeaz.com 7-

Using doctest

• Create a separate file that loads the module# testgcd.py

import gcd

import doctest

• Add the following code to test a module

doctest.testmod(gcd)

• To run the tests, do this% python testgcd.py

%

• If successful, will get no output

8

196

Page 201: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Using doctest• Test failures produce a report

% python testgcd.py

********************************************************

File "/Users/beazley/pyfiles/gcd.py", line 7, in gcd.gcd

Failed example:

gcd(40,16)

Expected:

8

Got:

7

********************************************************

1 items had failures:

1 of 1 in gcd.gcd

***Test Failed*** 1 failures.

%

9

Copyright (C) 2008, http://www.dabeaz.com 7-

Self-testing

• Modules may be set up to test themselves# gcd.py

def gcd(x,y):

""" ...

"""

if __name__ == '__main__':

import doctest

doctest.testmod()

• Will run tests if executed as main program% python gcd.py

10

197

Page 202: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Exercise 7.1

11

Time : 10 Minutes

Copyright (C) 2008, http://www.dabeaz.com 7-

Testing: unittest

• unittest module

• A more formal testing framework

• Class-based

• Better suited for exhaustive testing

• Can deal with more complex test cases

12

198

Page 203: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Using unittest

• First, you create a separate file# testgcd.py

import gcd

import unittest

class TestGCDFunction(unittest.TestCase):

...

• Then you define a testing class

• Must inherit from unittest.TestCase

13

Copyright (C) 2008, http://www.dabeaz.com 7-

Using unittest

• Define testing methodsclass TestGCDFunction(unittest.TestCase):

def testsimple(self):

# Test with simple integer arguments

g = gcd.gcd(40,16)

self.assertEqual(g,8)

def testfloat(self):

# Test with floats. Should be error

self.assertRaises(TypeError,gcd.gcd,3.5,4.2)

def testlong(self):

# Test with long integers

g = gcd.gcd(23748928388L, 6723884L)

self.assertEqual(g,4L)

self.assert_(type(g) is long)

• Each method must start with "test..."

14

199

Page 204: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Using unittest• Each test works through assertions

# Assert that expr is True

self.assert_(expr)

# Assert that x == y

self.assertEqual(x,y)

# Assert that x != y

self.assertNotEqual(x,y) # Assert x != y

# Assert that callable(*args,**kwargs) raises a given

# exception

self.assertRaises(exc,callable,*args,**kwargs)

• There are several other tests, but these are the main ones

15

Copyright (C) 2008, http://www.dabeaz.com 7-

Running unittests• To run tests, you add the following code

# testgcd.py

class TestGCDFunction(unittest.testcase):

...

unittest.main()

• Then run Python on the test file

% python testgcd.py

...

---------------------------------------------------

Ran 3 tests in 0.000s

OK

%

16

200

Page 205: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Setup and Teardown• Two additional methods can be defined

# testgcd.py

class TestGCDFunction(unittest.testcase):

def setUp(self):

# Perform setup before running a test

...

def tearDown(self):

# Perform cleanup after running a test

...

• Can be used to set up environment prior to running a test

• Called before and after each test method

17

Copyright (C) 2008, http://www.dabeaz.com 7-

unittest comments

• There is an art to effective unit testing

• Can grow to be quite complicated for large applications

• The unittest module has a huge number of options related to test runners, collection of results, and other aspects of testing (consult documentation for details)

18

201

Page 206: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Exercise 7.2

19

Time : 15 Minutes

Copyright (C) 2008, http://www.dabeaz.com 7-

Assertions

• assert statement

assert expr [, "diagnostic message" ]

• If expression is not true, raises AssertionError exception

• Should not be used to check user-input

• Use for internal program checking

20

202

Page 207: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Contract Programming• Consider assertions on all inputs and outputs

21

def gcd(x,y):

assert x > 0 # Pre-assertions

assert y > 0

while x > 0: x,y = y%x,x

assert y > 0 # Post-assertion

return y

• Checking inputs will immediately catch callers who aren't using appropriate arguments

• Checking outputs may catch buggy algorithms

• Both approaches prevent errors from propagating to other parts of the program

Copyright (C) 2008, http://www.dabeaz.com 7-

Optimized mode

• Python has an optimized run mode

python -O foo.py

• This strips all assert statements

• Allows debug/release mode development

• Normal mode for full debugging

• Optimized mode for faster production runs

22

203

Page 208: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

__debug__ variable

• Global variable checked for debugging

if __debug__:

# Perform some kind of debugging code

...

• By default, __debug__ is True

• Set False in optimized mode (python -O)

• The implementation is efficient. The if statement is stripped in both cases and in -O mode, the debugging code is stripped entirely.

23

Copyright (C) 2008, http://www.dabeaz.com 7-

How to Handle Errors

• Uncaught exceptions% python blah.py

(most recent call last):

File "blah.py", line 13, in ?

foo()

File "blah.py", line 10, in foo

bar()

File "blah.py", line 7, in bar

spam()

File "blah.py", line 4, in spam

x.append(3)

AttributeError: 'int' object has no attribute 'append'

• Program prints traceback, exits

24

204

Page 209: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Creating Tracebacks

• How to create a traceback yourselfimport traceback

try:

...

except:

traceback.print_exc()

• Sending a traceback to a file

25

traceback.print_exc(file=f)

• Getting traceback as a string

err = traceback.format_exc()

Copyright (C) 2008, http://www.dabeaz.com 7-

Error Handling• Keeping Python alive upon termination% python -i blah.py

Traceback (most recent call last):

File "blah.py", line 13, in ?

foo()

File "blah.py", line 10, in foo

bar()

File "blah.py", line 7, in bar

spam()

File "blah.py", line 4, in spam

x.append(3)

AttributeError: 'int' object has no attribute 'append'

>>>

• Python enters normal interactive mode

• Can use to examine global data, objects, etc.

26

205

Page 210: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

The Python Debugger• pdb module

• Entering the debugger after a crash% python -i blah.py

Traceback (most recent call last):

File "blah.py", line 13, in ?

foo()

File "blah.py", line 10, in foo

bar()

File "blah.py", line 7, in bar

spam()

File "blah.py", line 4, in spam

x.append(3)

AttributeError: 'int' object has no attribute 'append'

>>> import pdb

>>> pdb.pm()

> /Users/beazley/python/blah.py(4)spam()

-> x.append(3)

(Pdb)

27

Copyright (C) 2008, http://www.dabeaz.com 7-

The Python Debugger

• Launching the debugger inside a programimport pdb

def some_function():

statements

...

pdb.set_trace() # Enter the debugger

...

statements

28

• This starts the debugger at the point of the set_trace() call

206

Page 211: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Python Debugger• Common debugger commands

(Pdb) help # Get help

(Pdb) w(here) # Print stack trace

(Pdb) d(own) # Move down one stack level

(Pdb) u(p) # Move up one stack level

(Pdb) b(reak) loc # Set a breakpoint

(Pdb) s(tep) # Execute one instruction

(Pdb) c(ontinue) # Continue execution

(Pdb) l(ist) # List source code

(Pdb) a(rgs) # Print args of current function

(Pdb) !statement # Execute statement

• For breakpoints, location is one of(Pdb) b 45 # Line 45 in current file

(Pdb) b file.py:45 # Line 34 in file.py

(Pdb) b foo # Function foo() in current file

(Pdb) b module.foo # Function foo() in a module

29

Copyright (C) 2008, http://www.dabeaz.com 7-

Debugging Example• Obtaining a stack trace

(Pdb) w

/Users/beazley/Teaching/python/blah.py(13)?()

-> foo()

/Users/beazley/Teaching/python/blah.py(10)foo()

-> bar()

/Users/beazley/Teaching/python/blah.py(7)bar()

-> spam()

> /Users/beazley/Teaching/python/blah.py(4)spam()

-> x.append(3)

(Pdb)

• Examing a variable(Pdb) print x

-1

(Pdb)

30

207

Page 212: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Debugging Example• Getting a source listing

(Pdb) list

1 # Compute the greatest common divisor. Uses clever

2 # tuple hack to avoid an extra temporary variable

3 def gcd(x,y):

4 -> while x > 0: x,y = y%x,x

5 return y

[EOF]

(Pdb)

• Setting a breakpoint(Pdb) b 5

Breakpoint 1 at /Users/beazley/Teaching/gcd.py:5

(Pdb)

• Running until break or completion(Pdb) cont

> /Users/beazley/Teaching/NerdRanch/Testing/gcd.py(5)gcd()

-> return y

31

Copyright (C) 2008, http://www.dabeaz.com 7-

Debugging Example• Debugging a function call

>>> import gcd

>>> import pdb

>>> pdb.runcall(gcd.gcd,42,14)

-> while x > 0: x,y = y%x,x

(Pdb)

• Single stepping(Pdb) print x,y

42 14

(Pdb) s

> /Users/beazley/Teaching/python/gcd.py(4)gcd()

-> while x > 0: x,y = y%x,x

(Pdb) print x,y

14 42

(Pdb) s

32

208

Page 213: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Python Debugger

• Running entire program under debugger

% python -m pdb someprogram.py

• Automatically enters the debugger before the first statement (allowing you to set breakpoints and change the configuration)

33

Copyright (C) 2008, http://www.dabeaz.com 7-

Profiling

• profile module

• Collects performance statistics and prints a reportimport profile

profile.run("command")

• Uses exec to execute the given command

• A command line alternative% python -m profile someprogram.py

34

209

Page 214: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Profile Sample Output% python -m profile cparse.py

447981 function calls (446195 primitive calls) in

5.640 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall

filename:lineno(function)

2 0.000 0.000 0.000 0.000 :0(StringIO)

101599 0.470 0.000 0.470 0.000 :0(append)

56 0.000 0.000 0.000 0.000 :0(callable)

4 0.000 0.000 0.000 0.000 :0(close)

1028 0.010 0.000 0.010 0.000 :0(cmp)

4 0.000 0.000 0.000 0.000 :0(compile)

1 0.000 0.000 0.000 0.000 :0(digest)

2 0.000 0.000 0.000 0.000 :0(exc_info)

1 0.000 0.000 5.640 5.640 :0(execfile)

4 0.000 0.000 0.000 0.000 :0(extend)

50 0.000 0.000 0.000 0.000 :0(find)

83102 0.430 0.000 0.430 0.000 :0(get)

...

35

Copyright (C) 2008, http://www.dabeaz.com 7-

Summary

• Documentation strings

• Testing with doctest

• Testing with unittest

• Debugging (pdb)

• Profiling

36

210

Page 215: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 7-

Exercise 7.3

37

Time : 10 Minutes

211

Page 216: Python Binder

Iterators and Generators Section 8

Copyright (C) 2008, http://www.dabeaz.com 8-

Overview

• Iteration protocol and iterators

• Generator functions

• Generator expressions

2

212

Page 217: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Iteration

• A simple definition: Looping over items

a = [2,4,10,37,62]

# Iterate over a

for x in a:

...

• A very common pattern

• loops, list comprehensions, etc.

• Most programs do a huge amount of iteration

3

Copyright (C) 2008, http://www.dabeaz.com 8-

Iteration: Protocol• An inside look at the for statement

for x in obj:

# statements

• Underneath the covers_iter = obj.__iter__() # Get iterator object

while 1:

try:

x = _iter.next() # Get next item

except StopIteration: # No more items

break

# statements

...

• Any object that supports __iter__() and next() is said to be "iterable."

4

213

Page 218: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Iteration: Protocol

• Manual iteration over a list>>> x = [1,2,3]

>>> it = x.__iter__()

>>> it

<listiterator object at 0x590b0>

>>> it.next()

1

>>> it.next()

2

>>> it.next()

3

>>> it.next()

Traceback (most recent call last):

File "<stdin>", line 1, in ?

StopIteration

>>>

5

Copyright (C) 2008, http://www.dabeaz.com 8-

Iterators Everywhere

• Most built in types support iterationa = "hello"

for c in a: # Loop over characters in a

...

b = { 'name': 'Dave', 'password':'foo'}

for k in b: # Loop over keys in dictionary

...

c = [1,2,3,4]

for i in c: # Loop over items in a list/tuple

...

f = open("foo.txt")

for x in f: # Loop over lines in a file

...

6

214

Page 219: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Exercise 8.1

7

Time : 5 Minutes

Copyright (C) 2008, http://www.dabeaz.com 8-

Supporting Iteration

• User-defined objects can support iteration

• Example: Counting down...>>> for x in countdown(10):

... print x,

...

10 9 8 7 6 5 4 3 2 1

>>>

8

• To make this work with your own object, you just have to provide a few methods

215

Page 220: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Supporting Iteration• Sample implementation

class countdown(object):

def __init__(self,start):

self.start = start

def __iter__(self):

return countdown_iter(self.start)

class countdown_iter(object):

def __init__(self,start):

self.count = start

def next(self):

if self.count <= 0:

raise StopIteration

r = self.count

self.count -= 1

return r

9

Copyright (C) 2008, http://www.dabeaz.com 8-

class countdown(object):

def __init__(self,start):

self.start = start

def __iter__(self):

return countdown_iter(self.start)

class countdown_iter(object):

def __init__(self,start):

self.count = start

def next(self):

if self.count <= 0:

raise StopIteration

r = self.count

self.count -= 1

return r

Supporting Iteration• Object that defines an iterator

Must define __iter__ which

creates an iterator object

10

216

Page 221: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

class countdown(object):

def __init__(self,start):

self.start = start

def __iter__(self):

return countdown_iter(self.start)

class countdown_iter(object):

def __init__(self,start):

self.count = start

def next(self):

if self.count <= 0:

raise StopIteration

r = self.count

self.count -= 1

return r

Supporting Iteration• Object that defines an iterator

Must define a class which is the iterator object

11

Copyright (C) 2008, http://www.dabeaz.com 8-

class countdown(object):

def __init__(self,start):

self.start = start

def __iter__(self):

return countdown_iter(self.start)

class countdown_iter(object):

def __init__(self,start):

self.count = start

def next(self):

if self.count <= 0:

raise StopIteration

r = self.count

self.count -= 1

return r

Supporting Iteration• Object that defines an iterator

Iterator must define next() to

return items

12

217

Page 222: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Iteration Example• Example use:

>>> c = countdown(5)

>>> for i in c:

... print i,

...

5 4 3 2 1

>>> for i in c:

... for j in c:

... print "(%d,%d)" % (i,j)

...

(5, 5)

(5, 4)

(5, 3)

...

(1, 2)

(1, 1)

>>>

13

Copyright (C) 2008, http://www.dabeaz.com 8-

Iteration Example• Why two classes?

• Iterator is object that operates on another object (usually)

• May be iterating in more than one place (nested loops)

countdown.start = 5

countdown_iter.count = 4

countdown_iter.count = 2

__iter__()

14

218

Page 223: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Exercise 8.2

15

Time : 15 Minutes

Copyright (C) 2008, http://www.dabeaz.com 8-

Defining Iterators

• Is there an easier way to do this?

• Iteration is extremely useful

• Annoying to have to define __iter__(), create iterator objects, and manage everything

16

219

Page 224: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Generators

• A function that defines an iterator

def countdown(n):

while n > 0:

yield n

n -= 1

>>> for i in countdown(5):

... print i,

...

5 4 3 2 1

>>>

• Any function that uses yield is a generator

17

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Functions• Behavior is totally different than normal func

• Calling a generator function creates an generator object. It does not start running the function.

def countdown(n):

print "Counting down from", n

while n > 0:

yield n

n -= 1

>>> x = countdown(10)

>>> x

<generator object at 0x58490>

>>>

Notice that no output was produced

18

220

Page 225: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Functions

• Function only executes on next()>>> x = countdown(10)

>>> x

<generator object at 0x58490>

>>> x.next()

Counting down from 10

10

>>>

• yield produces a value, but suspends function

• Function resumes on next call to next()>>> x.next()

9

>>> x.next()

8

>>>

Function starts executing here

19

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Functions

• When the generator returns, iteration stops>>> x.next()

1

>>> x.next()

Traceback (most recent call last):

File "<stdin>", line 1, in ?

StopIteration

>>>

20

221

Page 226: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Example• Follow a file, yielding new lines as added

def follow(f)

f.seek(0,2) # Seek to end of a file

while True:

line = f.readline()

if not line:

time.sleep(0.1) # Sleep for a bit

continue # and try again

yield line

• Observe an active log file

21

f = open("access-log")

for line in follow(f):

# Process the line

print line

Copyright (C) 2008, http://www.dabeaz.com 8-

Generators vs. Iterators

• A generator function is slightly different than an object that supports iteration

• A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.

• An object that supports iteration can be re-used over and over again (e.g., a list)

22

222

Page 227: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Exercise 8.3

23

Time : 30 minutes

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Expressions• A generator version of a list comprehension

>>> a = [1,2,3,4]

>>> b = (2*x for x in a)

>>> b

<generator object at 0x58760>

>>> for i in b: print i,

...

2 4 6 8

>>>

• Important differences

• Does not construct a list.

• Only useful purpose is iteration

• Once consumed, can't be reused

24

223

Page 228: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Expressions• General syntax

(expression for i in s if conditional)

• Can also serve as a function argument

sum(x*x for x in a)

• Can be applied to any iterable

>>> a = [1,2,3,4]

>>> b = (x*x for x in a)

>>> c = (-x for x in b)

>>> for i in c: print i,

...

-1 -4 -9 -16

>>>

25

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Expressions

• Example: Sum a field in a large input file

f = open("datfile.txt")

# Strip all lines that start with a comment

lines = (line for line in f if not line.startswith('#'))

# Split the lines into fields

fields = (s.split() for s in lines)

# Sum up one of the fields

print sum(float(f[2]) for f in fields)

• Solution

26

823.1838823 233.128883 14.2883881 44.1787723

377.1772737 123.177277 143.288388 3884.78772

...

224

Page 229: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Generator Expressions

• Solution

27

• Each generator expression only evaluates data as needed (lazy evaluation)

• Example: Running above on a 6GB input file only consumes about 60K of RAM

f = open("datfile.txt")

# Strip all lines that start with a comment

lines = (line for line in f if not line.startswith('#'))

# Split the lines into fields

fields = (s.split() for s in lines)

# Sum up one of the fields

print sum(float(f[2]) for f in fields)

Copyright (C) 2008, http://www.dabeaz.com 8-

Exercise 8.4

28

Time : 15 Minutes

225

Page 230: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Why Use Generators?

• Many problems are much more clearly expressed in terms of iteration

• Looping over a collection of items and performing some kind of operation (searching, replacing, modifying, etc.)

• Generators allow you to create processing "pipelines"

29

Copyright (C) 2008, http://www.dabeaz.com 8-

Why Use Generators?

• Generators encourage code reuse

• Separate the "iteration" from code that uses the iteration.

• Means that various iteration patterns can be defined more generally.

30

226

Page 231: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

Why Use Generators?

• Better memory efficiency

• "Lazy" evaluation

• Only produce values when needed

• Contrast to constructing a big list of values first

• Can operate on infinite data streams

31

Copyright (C) 2008, http://www.dabeaz.com 8-

The itertools Module

32

• A library module with various functions designed to help with iterators/generators

itertools.chain(s1,s2)

itertools.count(n)

itertools.cycle(s)

itertools.dropwhile(predicate, s)

itertools.groupby(s)

itertools.ifilter(predicate, s)

itertools.imap(function, s1, ... sN)

itertools.repeat(s, n)

itertools.tee(s, ncopies)

itertools.izip(s1, ... , sN)

• All functions process data iteratively.

• Implement various kinds of iteration patterns

227

Page 232: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 8-

More Information

33

http://www.dabeaz.com/generators

• "Generator Tricks for Systems Programmers" tutorial from PyCon'08

• More examples and more generator tricks

228

Page 233: Python Binder

Working With TextSection 9

Copyright (C) 2008, http://www.dabeaz.com 9-

Overview

• This sections expands upon text processing

• Some common programming idioms

• Simple text parsing

• Regular expression pattern matching

• Text generation

• Text I/O

2

229

Page 234: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Parsing

• Problem : Converting text to data

• This is an almost universal problem. Programs need to process various sorts of file formats, extract data from files, etc.

• Example : Reading column-oriented fields

• Example : Extracting data from XML/HTML

3

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Splitting• String Splitting

4

s.split([separator [, maxsplit]])s.rsplit([separator [, maxsplit]]

separator : Text separating elementsmaxsplit : Number of splits to perform

• Examples:

line = "foo/bar/spam"

line.split('/')

line.split('/',1)

line.rsplit('/',1)

['foo','bar','spam']

['foo','bar/spam']

['foo/bar','spam']

230

Page 235: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Stripping

• The following methods string text from the beginning or end of a string

5

s.strip([chars]) # Strip begin/ends.lstrip([chars]) # Strip on lefts.rstrip([chars]) # Strip on right

• Examples:

s = "==Hello=="s.strip('=') "Hello"s.lstrip('=') "Hello=="s.rstrip('=') "==Hello"

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Searching

• Searching : s.find(text [,start])

6

s = "<html><head><title>Hello World</title><head>..."

title_start = s.find("<title>")if title_start >= 0: title_end = s.find("</title>",title_start)

• Returns index of starting character or -1 if text wasn't found.

• Use a slice to extract text fragments

title_text = s[title_start+7:title_end]

231

Page 236: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Replacement

• Replacement : s.replace(text, new [, count])

7

s = "Is Chicago not Chicago?"

>>> s.replace('Chicago','Peoria')'Is Peoria not Peoria?'

>>> s.replace('Chicago','Peoria',1)'Is Peoria not Chicago?'

• Reminder: This always returns a new string

Copyright (C) 2008, http://www.dabeaz.com 9-

Commentary

• Simple text parsing problems can often be solved by just using various combinations of string splitting/stripping/finding operations

• These low level operations are implemented in C in the Python interpreter

• Depending on what is being parsed, this kind of approach may the fastest implementation

8

232

Page 237: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Exercise 9.1

9

Time : 15 Minutes

Copyright (C) 2008, http://www.dabeaz.com 9-

re Module

• Regular expression pattern matching

• Searching for text patterns

• Extracting text from a document

• Replacing text

• Example: Extracting URLs from text

10

Go to http://www.python.org for more information on Python

233

Page 238: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

re Module

• A quick review of regular expressions"foo" # Matches the text "foo""(foo|bar)" # Matches the text "foo" or "bar""(foo)*" # Match 0 or more repetitions of foo"(foo)+" # Match 1 or more repetitions of foo"(foo)?" # Match 0 or 1 repetitions of foo"[abcde]" # Match one of the letters a,b,c,d,e"[a-z]" # Match one letter from a,b,...,z"[^a-z]" # Match any character except a,b,...z"." # Match any character except newline"\*" # Match the * character"\+" # Match the + character"\d" # Match a digit"\s" # Match whitespace"\w" # Match alphanumeric character

11

Copyright (C) 2008, http://www.dabeaz.com 9-

re Module

• Patterns supplied as strings

• Usually specified using raw strings

pat = r'(http://[\w-]+(?:\.[\w-]+)*(?:/[\w?#=&.,%_-]*)*)'

• Example

• Raw strings don't interpret escapes (\)

12

234

Page 239: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

re Usage

• You start with some kind of pattern string

pat = r'<title>(.*?)</title>'

13

• You then compile the pattern string

patc = re.compile(pat,re.IGNORECASE)

• You then use the compiled pattern to perform various matching/searching operations

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Matching

• How to match a string against a patternm = patc.match(somestring)if m: # Matchedelse: # No match

• Example>>> m = patc.match("<title>Python Introduction</title>")>>> m<_sre.SRE_Match object at 0x5da68>>>> m = patc.match("<h1>Python Introduction</h1>")>>> m>>>

14

235

Page 240: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Searching

• How to search for a pattern in a stringm = patc.search(somestring)if m: # Foundelse: # Not found

• Example

>>> m = patc.search("<body><title>Python Introduction</title>...")

>>> m<_sre.SRE_Match object at 0x5da68>>>>

15

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Match Objects• How to get the text that was matched

m = patc.search(s)if m: text = m.group()

• Example

first = m.start() # Index of pattern startlast = m.end() # Index of pattern endfirst,last = m.span() # Start,end indices together

• How to get the location of the text matched

>>> m = patc.search("<title>Python Introduction</title>")>>> m.group()'<title>Python Introduction</title>'>>> m.start()0>>> m.end()34>>>

16

236

Page 241: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Groups

• Regular expressions may define groupspat = r'<title>(.*?)</title>'pat = r'([\w-]+):(.*)'

• Groups are assigned numbers

pat = r'<title>(.*?)</title>'pat = r'([\w-]+):(.*)'

1

1 2

• Number determined left-to-right

17

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Groups

• When matching, groups can be extracted

>>> m = patc.match("<title>Python Introduction</title>")>>> m.group()'<title>Python Introduction</title>'>>> m.group(1)'Python Introduction'>>>

18

• Used to easily extract text for various parts of a regex pattern

237

Page 242: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Search Example

• Find all occurrences of a pattern

matches = patc.finditer(s)for m in matches: print m.group()

19

• This returns a list of match objects for later processing

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Pattern Replacement

• Find all patterns and replacedef make_link(m): url = m.group() return '<a href="%s">%s</a>' % (url,url)

t = patc.sub(make_link,s)

• Example:

20

>>> patc.sub(make_link,"Go to http://www.python.org")'Go to <a href="http://www.python.org">http://www.python.org</a>'>>>

238

Page 243: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

re: Comments• re module is very powerful

• I have only covered the essential basics

• Strongly influenced by Perl

• However, regexs are not an operator

• Reference:

Jeffrey Friedl, "Mastering Regular Expressions", O'Reilly & Associates, 2006.

21

Copyright (C) 2008, http://www.dabeaz.com 9-

Exercise 9.2

22

Time : 15 Minutes

239

Page 244: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Generating Text

• Programs often need to generate text

• Reports

• HTML pages

• XML

• Endless possibilities

23

Copyright (C) 2008, http://www.dabeaz.com 9-

String Concatenation

• Strings can be concatenated using +

24

s = "Hello"t = "World"

a = s + t # a = "HelloWorld"

• Although (+) is fine for just a few strings, it has horrible performance if you are concatenating many small chunks together to create a large string

• Should not be used for generating output

240

Page 245: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

String Joining

• The fastest way to join many strings

25

chunks = ["chunk1","chunk2",..."chunkN"]

result = separator.join(chunks)

• Example:

chunks = ["Is","Chicago","Not","Chicago?"]

" ".join(chunks) "Is Chicago Not Chicago?"",".join(chunks) "Is,Chicago,Not,Chicago?""".join(chunks) "IsChicagoNotChicago?"

Copyright (C) 2008, http://www.dabeaz.com 9-

String Joining Example

• Don't do this:

26

s = ""for x in seq: ... s += "some text being produced" ...

• Better:

chunks = []for x in seq: ... chunks.append("some text being produced") ...s = "".join(chunks)

241

Page 246: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Printing to a String

• StringIO module

• Provides a "file-like" object you can print to

27

import StringIO

out = StringIO.StringIO()for x in seq: ... print >>out, "some text being produced" ...s = out.getvalue()

• In certain situations, this object can be used in place of a file

Copyright (C) 2008, http://www.dabeaz.com 9-

String Interpolation

• In languages like Perl and Ruby, programmers are used to string interpolation features

28

$name = "Dave";$age = 39;

print "$name is $age years old\n";

• Python doesn't have a direct equivalent

• However, there are some alternatives

242

Page 247: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Dictionary Formatting

• String formatting against a dictionary

29

fields = { 'name' : 'Dave', 'age' : 39}

print "%(name)s is %(age)s years old\n" % fields

• The above requires a dictionary, but you can easily get a dictionary of variables

name = "Dave"age = 39

print "%(name)s is %(age)s years old\n" % vars()

Copyright (C) 2008, http://www.dabeaz.com 9-

Template Strings

• A special string that supports $substitutions

30

import strings = string.Template("$name is $age years old\n")fields = { 'name' : 'Dave', 'age' : 39}

print s.substitute(fields)

• An alternative method for missing values

s.safe_substitute(fields)

243

Page 248: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Advanced Formatting

• Python 2.6/3.0 have new string formatting

31

print "{0} is {1:d} years old".format("Dave",39)

print "{name} is {age:d} years old".format(name="Dave",age=39)

stock = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 }

print "{s[name]:10s} {s[shares]:10d} {s[price]:10.2f}"\ .format(s=stock)

• Allows much more flexible substitution and lookup than the % operator

Copyright (C) 2008, http://www.dabeaz.com 9-

Exercise 9.3

32

Time : 20 Minutes

244

Page 249: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Input/Output

• You frequently read/write text from files

• Example: Reading line-by-line

33

f = open("something.txt","r")for line in f: ...

• Example: Writing a line of text

f = open("something.txt","w")f.write("Hello World\n")

print >>f, "Hello World\n"

• There are still a few issues to worry about

Copyright (C) 2008, http://www.dabeaz.com 9-

Line Handling

• Question: What is a text line?

• It's different on different operating systems

34

some characters .......\n (Unix)some characters .......\r\n (Windows)

• By default, Python uses the system's native line ending when writing text files

• However, it can get messy when reading text files (especially cross platform)

245

Page 250: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Line Handling

35

• Example: Reading a Windows text file on Unix

>>> f = open("test.txt","r")>>> f.readlines()['Hello\r\n', 'World\r\n']>>>

• Notice how the lines include the extra Windows '\r' character

• This is a potential source of problems for programs that only expect '\n' line endings

Copyright (C) 2008, http://www.dabeaz.com 9-

Universal Newline

36

• Python has a special "Universal Newline" mode>>> f = open("test.txt","U")>>> f.read()'Hello World\n'>>>

• Converts all endings to standard '\n' character

• f.newlines records the actual newline character that was used in the file

>>> f.newlines'\r\n'>>>

246

Page 251: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Universal Newline

37

• Example: Reading a Windows text file on Unix

>>> f = open("test.txt","r")>>> f.readlines()['Hello\r\n', 'World\r\n']

>>> f = open("test.txt","U")>>> f.readlines()['Hello\n', 'World\n']>>> f.newlines'\r\n'>>>

• Notice how non-native Windows newline '\r\n' is translated to standard '\n'

Copyright (C) 2008, http://www.dabeaz.com 9-

Text Encoding

38

• Question : What is a character?

• In Python 2, text consists of 8-bit characters

"Hello World" 48 65 6c 6c 6f 20 57 6f 72 6c 64

• Characters are usually encoded in ASCII

247

Page 252: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

International Characters

39

• Problem : How to deal with characters from international character sets?

"That's a spicy Jalapeño!"

• Question: What is the character encoding?

• Historically, everyone made a different encoding

ñ = 0x96 (MacRoman)ñ = 0xf1 (CP1252 - Windows)ñ = 0xa4 (CP437 - DOS)

• Bloody hell!

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode

• For international characters, use Unicode

• In Python, there is a special syntax for literals

40

t = u"That's a spicy Jalape\u00f1o!"

Unicode

• Unicode strings are just like regular strings except that they hold Unicode characters

• What is a Unicode character?

248

Page 253: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode Characters

• Unicode defines a standard numerical value for every character used in all languages (except for fictional ones such as Klingon)

• The numeric value is known as "code point"

• There are a lot of code points (>100,000)

41

ñ!

!

= U+00F1= U+03B5= U+0A87= U+3304

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode Charts• http://www.unicode.org/charts

42

249

Page 254: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Using Unicode Charts

43

t = u"That's a spicy Jalape\u00f1o!"

• \uxxxx - Embeds a Unicode code point in a string

• Code points specified in hex by convention

Copyright (C) 2008, http://www.dabeaz.com 9-

Using Unicode Charts

44

t = u"Spicy Jalape\N{LATIN SMALL LETTER N WITH TILDE}o!"

• \N{name} - Embeds a named character

• All code points also have descriptive names

250

Page 255: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode Representation

• Internally, Unicode characters are 16-bits

45

t = u"Jalape\u00f1o"

004a 0061 006c 0061 0070 0065 00f1 006f

• Normally, you don't worry about this

• Except you have to perform I/O

u'J' --> 00 4a (Big Endian)u'J' --> 4a 00 (Little Endian)

• How do characters get encoded in the file?

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode I/O

• Unicode does not define a standard file encoding--it only defines character code values

• There are many different file encodings

46

• Examples: UTF-8, UTF-16, etc.

• Most popular: UTF-8 (ASCII is a subset)

• So, how do you deal with these encodings?

251

Page 256: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode File I/O

• Unicode I/O handled using codecs module

• codecs.open(filename,mode,encoding)

47

>>> f = codecs.open("data.txt","w","utf-8")>>> f.write(u"Hello World\n")>>> f.close()

>>> f = codecs.open("data.txt","w","utf-16")>>> f.write(data)>>>

• Several hundred character codecs are provided

• Consult documentation for details

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode Encoding

• Explicit encoding via strings

48

>>> a = u"Jalape\u00f1o">>> enc_a = a.encode("utf-8")>>>

• Explicit decoding from bytes

>>> enc_a = 'Jalape\xc3\xb1o'>>> a = enc_a.decode("utf-8")>>> au'Jalape\xf1o'>>>

252

Page 257: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Encoding Errors

• Encoding/Decoding text is often messy

• May encounter broken/invalid data

• The default behavior is to raise an UnicodeError Exception

49

>>> a = u"Jalape\xf1o">>> b = a.encode("ascii")Traceback (most recent call last): File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 6: ordinal not in range(128)>>>

Copyright (C) 2008, http://www.dabeaz.com 9-

Encoding Errors

• Encoding/Decoding can use an alternative error handling policy

50

s.decode("encoding",errors)s.encode("encoding",errors)

• Errors is one of'strict' Raise exception (the default)'ignore' Ignore errors'replace' Replace with replacement character'backslashreplace' Use escape code'xmlcharrefreplace' Use XML character reference

253

Page 258: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Encoding Errors

• Example: Ignore bad characters

51

>>> a = u"Jalape\xf1o">>> a.encode("ascii",'ignore')'Jalapeo'>>>

• Example: Encode Unicode into ASCII

>>> a = u"Jalape\xf1o">>> b = a.encode("us-ascii","xmlcharrefreplace")'Jalape&#241;o'>>>

Copyright (C) 2008, http://www.dabeaz.com 9-

Finding the Encoding

• How do you determine the encoding of a file?

• Might be known in advance (in the manual)

• May be indicated in the file itself

52

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

• Depends on the data source, application, etc.

254

Page 259: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Unicode Everywhere

• Unicode is the modern standard for text

• In Python 3, all text is Unicode

• Here are some basic rules to remember:

• All text files are encoded (even ASCII)

• When you read text, you always decode

• When you write text, you always encode

53

Copyright (C) 2008, http://www.dabeaz.com 9-

A Caution• Unicode may sneak in when you don't expect it

• Database integration

• XML Parsing

• Unicode silently propagates through string-ops

54

s = "Spicy" # Standard 8-bit stringt = u"Jalape\u00f1o" # Unicode string

w = s + t # Unicode : u'SpicyJalape\u00f1o"

• This propagation may break your code if it's not expecting to receive Unicode text

255

Page 260: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 9-

Exercise 9.4

55

Time : 10 Minutes

256

Page 261: Python Binder

Binary Data Handling and File I/O

Section 10

Copyright (C) 2008, http://www.dabeaz.com 10-

Introduction

• A major use of Python is to interface with foreign systems and software

• Example : Software written in C/C++

• In this section, we look at the problem of data interchange

• Using Python to decode/encode binary data encodings and other related topics

2

257

Page 262: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Overview

• Representation of binary data

• Binary file I/O

• Structure packing/unpacking

• Binary structures and ctypes

• Low-level I/O interfaces

3

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary Data

• Binary data - low-level machine data

• Examples : 16-bit integers, 32-bit integers, 64-bit double precision floats, packed strings, etc.

• Raw low-level data that you typically encounter with typed programming languages such as C, C++, Java, etc.

4

258

Page 263: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Common Scenarios

• You might encounter binary encodings in a number of settings

• Special file formats (images, video, audio, etc.)

• Low-level network protocols

• Control of HW (e.g., over serial ports)

• Reading/writing data meant for use by software in other languages (C, C++, etc.)

5

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary Data Representation

• To store binary data, you can use strings

• In Python 2, strings are just byte sequences

6

bytes = '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00'

\xhh - Encodes an arbitrary byte (hh)

• All of the normal string operations work except that you may have to specify a lot of non-text characters using \xhh escape codes

259

Page 264: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary Data Representation

• In Python 2.6 and newer, a special syntax should be used if you are writing byte literal strings in your program

7

header = b'\x89PNG\r\n'

byte string prefix

• This has been added to disambiguate text (Unicode) strings and raw byte strings

• A caution : This is optional in Python 2, but required in Python 3

Copyright (C) 2008, http://www.dabeaz.com 10-

byte arrays

• Python 2.6 introduces a new bytearray type

8

# Initialized from a byte string

b = bytearray(b"Hello World")

# Preinitialized to a specific size

buf = bytearray(1024)

• A bytearray supports almost all of the usual string operations, but it is mutable

• You can make in-place assignments to characters and slices

260

Page 265: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary File I/O

9

• To obtain binary data, you'll typically read it from a file, pipe, network socket, etc.

• For files, there are special modes

f = open(filename,"rb") # Read, binary mode

f = open(filename,"wb") # Write, binary mode

f = open(filename,"ab") # Append, binary mode

• Disables all newline translation (reads/writes)

• Required for binary data on Windows

• Optional on Unix (a portability gotcha)

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary Data Packing

• A common method for dealing with binary data is to pack or unpack values from byte strings

10

b'%\x00\x00\x00*\x00\x00\x00'37, 42

python raw byte stringpack

unpack

• Packing/unpacking is about type conversion

• Converting low-level data to/from built-in Python types such as ints, floats, strings, etc.

261

Page 266: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

struct module

• Packs/unpacks binary records and structures

• Often used when interfacing Python to foreign systems or when reading non-text files (images, audio, video, etc.)

import struct

# Unpack two raw 32-bit integers from a string

x,y = struct.unpack("ii",s)

# Pack a set of fields

r = struct.pack("8sif", "GOOG",100, 490.10)

11

Copyright (C) 2008, http://www.dabeaz.com 10-

struct module• Packing/unpacking codes (based on C)

'c' char (1 byte string)

'b' signed char (8-bit integer)

'B' unsigned char (8-bit integer)

'h' short (16-bit integer)

'H' unsigned short (16-bit integer)

'i' int (32-bit integer)

'I' unsigned int (32-bit integer)

'l' long (32 or 64 bit integer)

'L' unsigned long (32 or 64 bit integer)

'q' long long (64 bit integer)

'Q' unsigned long long (64 bit integer)

'f' float (32 bit)

'd' double (64 bit)

's' char[] (String)

'p' char[] (String with 8-bit length)

'P' void * (Pointer)

12

262

Page 267: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

struct module• Each code may be preceded by repetition

count'4i' 4 integers

'20s' 20-byte string

• Integer alignment modifiers'@' Native byte order and alignment

'=' Native byte order, standard alignment

'<' Little-endian, standard alignment

'>' Big-endian, standard alignment

'!' Network (big-endian), standard align

• Standard alignment

Data is aligned to start on a multiple of the size. For example, an integer (4 bytes) is aligned to start on a 4-byte boundary.

13

Copyright (C) 2008, http://www.dabeaz.com 10-

struct Example• An IP packet header has this structure

• To unpack in Python, might do this:

ver hlen tos length

ident fragment

ttl proto checksum

sourceaddr

destaddr

32-bits

(vhlen,tos,length,

ident,fragment,

ttl,proto,checksum,

sourceaddr,

destaddr) = struct.unpack("!BBHHHBBHII", pkt)

ver = (vhlen & 0xf0) >> 4

hlen = vhlen & 0x0f

14

263

Page 268: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Exercise 10.1

15

Time : 15 Minutes

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary Type Objects

• Instead of unpacking/packing, an alternative approach is to define new kinds of Python objects that internally represent low-level binary data in its native format

• These objects then provide an interface for manipulating the internal data from Python

• Instead of converting data, this is just "wrapping" binary data with a Python layer

16

264

Page 269: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

ctypes library

• A library that has objects for defining low-level C types, structures and arrays

• Can create objects that look a lot like normal Python objects except that they are represented using a C memory layout

• A companion/alternative to struct

17

Copyright (C) 2008, http://www.dabeaz.com 10-

ctypes types• There are a predefined set of type objects

18

ctypes type C Datatype

------------------ ---------------------------

c_byte signed char

c_char char

c_char_p char *

c_double double

c_float float

c_int int

c_long long

c_longlong long long

c_short short

c_uint unsigned int

c_ulong unsigned long

c_ushort unsigned short

c_void_p void *

265

Page 270: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

ctypes types• Types are objects that can be instantiated

19

>>> from ctypes import *

>>> x = c_long(42)

>>> x

c_long(42)

>>> print x.value

42

>>> x.value = 23

>>> x

c_long(23)

>>>

• Unlike many Python types, the values are mutable (access/modify through .value)

• Accessing the value is touching raw memory

Copyright (C) 2008, http://www.dabeaz.com 10-

ctypes arrays

• Defining a C array type

20

>>> long4 = c_long * 4

>>> long4

<class '__main__.c_long_Array_4'>

>>>

• Creating and using an instance of this type>>> a = long4(1,2,3,4)

>>> len(a)

4

>>> a[0]

1

>>> a[1]

2

>>> for x in a: print a,

1 2 3 4

>>>

266

Page 271: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

ctypes structures

• Defining a C structure

21

>>> class Point(Structure):

_fields_ = [ ('x', c_double),

('y', c_double) ]

>>> p = Point(2,3)

>>> p.x

2.0

>>> p.y

3.0

>>>

• Looks kind of like a normal Python object, but is really just a layer over a C struct

Copyright (C) 2008, http://www.dabeaz.com 10-

Direct I/O

• Few Python programmers know this, but ctypes objects support direct I/O

• Can directly write ctypes objects onto files

22

p = Point(2,3)

f = open("somefile","wb")

f.write(p)

...

• Can directly read into existing ctypes objectsf = open("somefile","rb")

p = Point() # Create an empty point

f.readinto(p) # Read into it

...

267

Page 272: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Exercise 10.2

23

Time : 10 Minutes

Copyright (C) 2008, http://www.dabeaz.com 10-

Binary Arrays

• array module used to define binary arrays

24

>>> from array import array

>>> a = array('i',[1,2,3,4,5]) # Integer array

>>> a.append(6)

>>> a

array('i',[1,2,3,4,5,6])

>>> a.append(37.5)

TypeError: an integer is required

>>>

• An array is a like a list except that all elements are constrained to a single type

a.typecode # Array type code

a.itemsize # Item size in bytes

268

Page 273: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Array Initialization• array(typecode [, initializer])

25

Typecode Meaning

-------- -----------------

'c' 8-bit character

'b' 8-bit integer (signed)

'B' 8-bit integer (unsigned)

'h' 16-bit integer (signed)

'H' 16-bit integer (unsigned)

'i' Integer (int)

'I' Unsigned integer (unsigned int)

'l' Long integer (long)

'L' Unsigned long integer (unsigned long)

'f' 32-bit float (single precision)

'd' 64-bit float (double precision)

• Initializer is just a sequence of input values

Copyright (C) 2008, http://www.dabeaz.com 10-

Array Operations

• Arrays work a lot like lists

26

a.append(item) # Append an item

a.index(item) # Index of item

a.remove(item) # Remove item

a.count(item) # Count occurrences

a.insert(index,item) # Insert item

• Key difference is the uniform data type

• Underneath the covers, data is stored in a contiguous memory region--just like an array in C/C++

269

Page 274: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

Array Operations

• Arrays also have I/O functions

27

a.fromfile(f,n) # Read and append n items

# from file f

a.tofile(f) # Write all items to f

• Arrays have string packing functions

a.fromstring(s) # Appends items from packed

# binary string s

a.tostring() # Convert to a packed string

Copyright (C) 2008, http://www.dabeaz.com 10-

Some Facts about Arrays

• Arrays are more space efficient

28

1000000 integers in a list : 16 Mbytes

1000000 integers in an array : 4 Mbytes

• List item access is about 70% faster. Arrays store values as native C datatypes so every access involves creating a Python object (int, float) to view the value

• Arrays are not mathematical vectors or matrices. The (+) operator concatenates.

270

Page 275: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 10-

numpy arrays

• For mathematics, consider the numpy library

• It provides arrays that behave like vectors and matrices (element-wise operations)

29

>>> from numpy import array

>>> a = array([1,2,3,4])

>>> b = array([5,6,7,8])

>>> a + b

array([6, 8, 10, 12])

>>>

• Many scientific packages build upon this

271

Page 276: Python Binder

Working with ProcessesSection 11

Copyright (C) 2008, http://www.dabeaz.com 11-

Overview

• Using Python to launch other processes, control their environment, and collect their output

• Detailed coverage of subprocess module

2

272

Page 277: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Python Interpreter

• Python comes from the Unix/C tradition

• Interpreter is command-line/console based

3

% python foo.py

• Input/output is from a terminal (tty)

• Programs may optionally use command line arguments and environment variables

• Python can control other programs that use the same Unix conventions

Copyright (C) 2008, http://www.dabeaz.com 11-

Subprocesses

• A program can create a new process

• This is called a "subprocess"

• The subprocess often runs under the control of the original process (which is known as the "parent" process)

• Parent often wants to collect output or the status result of the subprocess

4

273

Page 278: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Simple Subprocesses

• There are a number of functions for simple subprocess control

• Executing a system shell command

5

os.system("rm -rf /tmpdata")

• Capturing the output of a command (Unix)

import commands

s = commands.getoutput("ls -l")

• This would be the closest Python equivalent to backticks (`ls -l`) in the shell, Perl, etc.

Copyright (C) 2008, http://www.dabeaz.com 11-

Advanced Subprocesses

• Python has several modules for subprocesses

• os, popen2, subprocess

6

• Historically, this has been a bit of a moving target. In older Python code, you will probably see extensive use of the popen2 and os modules

• Currently, the official "party line" is to use the subprocess module

274

Page 279: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

subprocess Module

• A high-level module for launching subprocesses

• Cross-platform (Unix/Windows)

• Tries to consolidate the functionality of a wide-assortment of low-level system calls (system, popen(), exec(), spawn(), etc.)

• Will illustrate with some common use cases

7

Copyright (C) 2008, http://www.dabeaz.com 11-

Executing Commands

Problem: You want to execute a simple command or run a separate program. You don't care about capturing its output.

import subprocess

p = subprocess.Popen(['mkdir','temp'])

q = subprocess.Popen(['rm','-f','tempdata'])

• Executes a command string

• Returns a Popen object (more in a minute)

8

275

Page 280: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Specifying the Command

subprocess.Popen(['rm','-f','tempdata'])

• Popen() accepts a list of command args

9

• These are the same as the args in the shellshell % rm -f tempdata

• Note: Each "argument" is a separate itemsubprocess.Popen(['rm','-f','tempdata']) # Good

subprocess.Popen(['rm','-f tempdata']) # Bad

Don't merge multiple arguments into a single string like this.

Copyright (C) 2008, http://www.dabeaz.com 11-

Environment Vars

env_vars = { 'NAME1' : 'VALUE1',

'NAME2' : 'VALUE2',

...

}

p = subprocess.Popen(['cmd','arg1',...,'argn'],

env=env_vars)

• How to set up environment variables in child

10

• Note : If this is supplied and there is a PATH environment variable, it will be used to search for the command (Unix)

276

Page 281: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Current Directory

p = subprocess.Popen(['cmd','arg1',...,'argn'],

cwd='/some/directory')

• If you need to change the working directory

11

• Note: This changes the working directory for the subprocess, but does not affect how Popen() searches for the command

Copyright (C) 2008, http://www.dabeaz.com 11-

Collecting Status Codes

• When subprocess terminates, it returns a status

• An integer code of some kind

12

C:

exit(status);

Java:

System.exit(status);

Python:

raise SystemExit(status)

• Convention is for 0 to indicate "success." Anything else is an error.

277

Page 282: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Collecting Status Codes

p = subprocess.Popen(['cmd','arg1',...,'argn'])

...

status = p.wait()

• When you launch a subprocess, it runs independently from the parent

• To wait and collect status, use wait()

13

• Status will be the integer return code (which is also stored)

p.returncode # Exit status of subprocess

Copyright (C) 2008, http://www.dabeaz.com 11-

Polling a Subprocess

p = subprocess.Popen(['cmd','arg1',...,'argn'])

...

if p.poll() is None:

# Process is still running

else:

status = p.returncode # Get the return code

• poll() - Checks status of subprocess

14

• Returns None if the process is still running, otherwise the returncode is returned

278

Page 283: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Killing a Subprocess

p = subprocess.Popen(['cmd','arg1',...,'argn'])

...

p.terminate()

• In Python 2.6 or newer, use terminate()

15

• In older versions, you have to hack it yourself

# Unix

import os

os.kill(p.pid)

# Windows

import win32api

win32api.TerminateProcess(int(p._handle),-1)

Copyright (C) 2008, http://www.dabeaz.com 11-

Exercise 11.1

16

Time : 20 Minutes

279

Page 284: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Capturing OutputProblem: You want to execute another program and capture its output

• Use additional options to Popen()import subprocess

p = subprocess.Popen(['cmd'],

stdout=subprocess.PIPE)

data = p.stdout.read()

• This works with both Unix and Windows

• Captures any output printed to stdout

17

Copyright (C) 2008, http://www.dabeaz.com 11-

Sending/Receiving DataProblem: You want to execute a program, send it some input data, and capture its output

• Set up pipes using Popen()

p = subprocess.Popen(['cmd'],

stdin = subprocess.PIPE,

stdout = subprocess.PIPE)

p.stdin.write(data) # Send data

p.stdin.close() # No more input

result = p.stdout.read() # Read output

python cmdp.stdout

p.stdin stdin

stdout

18

280

Page 285: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Sending/Receiving DataProblem: You want to execute a program, send it some input data, and capture its output

• Set up pipes using Popen()

p = subprocess.Popen(['cmd'],

stdin = subprocess.PIPE,

stdout = subprocess.PIPE)

p.stdin.write(data) # Send datap.stdin.close() # No more input

result = p.stdout.read() # Read output

python cmdp.stdout

p.stdin stdin

stdout

19

Pair of files that areare hooked up to the

subprocess

Copyright (C) 2008, http://www.dabeaz.com 11-

Sending/Receiving Data• How to capture stderr

p = subprocess.Popen(['cmd'],

stdin=subprocess.PIPE,

stdout=subprocess.PIPE,

stderr=subprocess.PIPE)

python cmdp.stdout

p.stdin stdin

stdout

20

p.stderr stderr

• Note: stdout/stderr can also be merged

p = subprocess.Popen(['cmd'],

stdin=subprocess.PIPE,

stdout=subprocess.PIPE,

stderr=subprocess.STDOUT)

281

Page 286: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

I/O Redirection

• Connecting input to a file

f_in = open("somefile","r")

p = subprocess.Popen(['cmd'], stdin=f_in)

21

• Connecting the output to a file

f_out = open("somefile","w")

p = subprocess.Popen(['cmd'], stdout=f_out)

• Basically, stdin and stdout can be connected to any open file object

• Note : Must be a real file in the OS

Copyright (C) 2008, http://www.dabeaz.com 11-

Subprocess I/O

• Subprocess module can be used to set up fairly complex I/O patterns

22

import subprocess

p1 = subprocess.Popen("ls -l", shell=True,

stdout=subprocess.PIPE)

p2 = subprocess.Popen("wc",shell=True,

stdin=p1.stdout,

stdout=subprocess.PIPE)

out = p2.stdout.read()

• Note: this is the same as this in the shell

shell % ls -l | wc

282

Page 287: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

I/O Issues

• subprocess module does not work well for controlling interactive processes

• Buffering behavior is often wrong (may hang)

• Pipes don't properly emulate terminals

• Subprocess may not operate correctly

• Does not work for sending keyboard input typed into any kind of GUI.

23

Copyright (C) 2008, http://www.dabeaz.com 11-

Interactive SubprocessesProblem: You want to launch a subprocess, but it involves interactive console-based user input

• Must launch subprocess and make its input emulate an interactive terminal (TTY)

• Best bet: Use a third-party "Expect" library (e.g., pexpect)

• Modeled after Unix "expect" tool (Don Libes)

24

283

Page 288: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

pexpect Example• http://pexpect.sourceforge.net

• Sample of controlling an interactive session

25

import pexpect

child = pexpect.spawn("ftp ftp.gnu.org")

child.expect('Name .*:')

child.sendline('anonymous')

child.expect('ftp> ')

child.sendline('cd gnu/emacs')

child.expect('ftp> ')

child.sendline('ls')

child.expect('ftp> ')

print child.before

child.sendline('quit')

Expected output

Send responses

Copyright (C) 2008, http://www.dabeaz.com 11-

Odds and Ends• Python a large number of other modules and

functions related to process management

• os module (fork, wait, exec, etc.)

• signal module (signal handling)

• time module (system time functions)

• resource (system resource limits)

• locale (internationalization)

• _winreg (Windows Registry)

26

284

Page 289: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 11-

Exercise 11.2

27

Time : 10 Minutes

285

Page 290: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Python Integration Primer

Section 12

1

Copyright (C) 2008, http://www.dabeaz.com 12-

Python Integration

• People don't use Python in isolation

• They use it to interact with other software

• Software not necessarily written in Python

• This is one of Python's greatest strengths!

2

286

Page 291: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Overview

• A brief tour of how Python integrates with the outside world

• Support for common data formats

• Network programming

• Accessing C libraries

• COM Extensions

• Jython (Java) and IronPython (.NET)

3

Copyright (C) 2008, http://www.dabeaz.com 12-

Data Interchange

• Python is adept at exchanging data with other programs using standard data formats

• Example : Processing XML

4

287

Page 292: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

XML Overview

• XML documents use structured markup<contact>

<name>Elwood Blues</name>

<address>1060 W Addison</address>

<city>Chicago</city>

<zip>60616</zip>

</contact>

• Documents made up of elements<name>Elwood Blues</name>

• Elements have starting/ending tags

• May contain text and other elements

5

Copyright (C) 2008, http://www.dabeaz.com 12-

XML Example<?xml version="1.0" encoding="iso-8859-1"?>

<recipe>

<title>Famous Guacamole</title>

<description>

A southwest favorite!

</description>

<ingredients>

<item num="2">Large avocados, chopped</item>

<item num="1">Tomato, chopped</item>

<item num="1/2" units="C">White onion, chopped</item>

<item num="1" units="tbl">Fresh squeezed lemon juice</item>

<item num="1">Jalapeno pepper, diced</item>

<item num="1" units="tbl">Fresh cilantro, minced</item>

<item num="3" units="tsp">Sea Salt</item>

<item num="6" units="bottles">Ice-cold beer</item>

</ingredients>

<directions>

Combine all ingredients and hand whisk to desired consistency.

Serve and enjoy with ice-cold beers.

</directions>

</recipe>

6

288

Page 293: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

XML Parsing

• Parsing XML documents is easy

• Use the xml.etree.ElementTree module

7

Copyright (C) 2008, http://www.dabeaz.com 12-

etree Parsing

• How to parse an XML document

>>> import xml.etree.ElementTree

>>> doc = xml.etree.ElementTree.parse("recipe.xml")

• This creates an ElementTree object>>> doc

<xml.etree.ElementTree.ElementTree instance at 0x6e1e8>

• Supports many high-level operations

8

289

Page 294: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

etree Parsing

• Obtaining selected elements

>>> title = doc.find("title")

>>> firstitem = doc.find("ingredients/item")

>>> firstitem2 = doc.find("*/item")

>>> anyitem = doc.find(".//item")

• find() always returns the first element found

• Note: Element selection syntax above

9

Copyright (C) 2008, http://www.dabeaz.com 12-

etree Parsing

• To loop over all matching elements

>>> for i in doc.findall("ingredients/item"):

... print i

...

<Element item at 7a6c10>

<Element item at 7a6be8>

<Element item at 7a6d50>

<Element item at 7a6d28>

<Element item at 7a6b20>

<Element item at 7a6d00>

<Element item at 7a6af8>

<Element item at 7a6d78>

10

290

Page 295: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

etree Parsing• Obtaining the text from an element

>>> doc.findtext("title")

'Famous Guacamole'

>>> doc.findtext("directions")

'\n Combine all ingredients and hand whisk to

desired consistency.\n Serve and enjoy with ice-cold

beers.\n '

>>> doc.findtext(".//item")

'Large avocados, chopped'

>>>

11

• Searching and text extraction combined

>>> e = doc.find("title")

>>> e.text

'Famous Guacamole'

>>>

Copyright (C) 2008, http://www.dabeaz.com 12-

etree Parsing

• To obtain the element tag

>>> elem.tag

'item'

>>>

12

• To obtain element attributes>>> item.get('num')

'6'

>>> item.get('units')

'bottles'

291

Page 296: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

etree Example

• Print out recipe ingredientsfor i in doc.findall("ingredients/item"):

unitstr = "%s %s" % (i.get("num"),i.get("units",""))

print "%-10s %s" % (unitstr,i.text)

• Output2 Large avocados, chopped

1 Tomato, chopped

1/2 C White onion, chopped

1 tbl Fresh squeezed lemon juice

1 Jalapeno pepper, diced

1 tbl Fresh cilantro, minced

3 tsp Sea Salt

6 bottles Ice-cold beer

13

Copyright (C) 2008, http://www.dabeaz.com 12-

Exercise 12.1

14

Time : 10 Minutes

292

Page 297: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Network Programming

• Python has very strong support for network programming applications

• Low-level socket programming

• Server side/client side modules

• High-level application protocols

• Will look at just a few simple examples

15

Copyright (C) 2008, http://www.dabeaz.com 12-

Low-level Sockets• A simple TCP server (Hello World)

16

from socket import *

s = socket(AF_INET,SOCK_STREAM)

s.bind(("",9000))

s.listen(5)

while True:

c,a = s.accept()

print "Received connection from", a

c.send("Hello %s\n" % a[0])

c.close()

• socket module provides low-level networking

• Programming API is almost identical to socket programming in C, Java, etc.

293

Page 298: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

High-Level Sockets

• A simple TCP server (Hello World)

17

import SocketServer

class HelloHandler(SocketServer.BaseRequestHandler):

def handle(self):

print "Connection from", self.client_address

self.request.sendall("Hello World\n")

serv = SocketServer.TCPServer(("",8000),HelloHandler)

serv.serve_forever()

• SocketServer module hides details

• Makes it easy to implement network servers

Copyright (C) 2008, http://www.dabeaz.com 12-

A Simple Web Server

• Serve files from a directory

18

from BaseHTTPServer import HTTPServer

from SimpleHTTPServer import SimpleHTTPRequestHandler

import os

os.chdir("/home/docs/html")

serv = HTTPServer(("",8080),SimpleHTTPRequestHandler)

serv.serve_forever()

• This creates a minimal web server

• Connect with a browser and try it out

294

Page 299: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

XML-RPC

• Remote Procedure Call

• Uses HTTP as a transport protocol

• Parameters/Results encoded in XML

• An RPC standard that is supported across a variety of different programming languages (Java, Javascript, C++, etc.)

19

Copyright (C) 2008, http://www.dabeaz.com 12-

Simple XML-RPC• How to create a stand-alone server

20

from SimpleXMLRPCServer import SimpleXMLRPCServer

def add(x,y):

return x+y

s = SimpleXMLRPCServer(("",8080))

s.register_function(add)

s.serve_forever()

• How to test it (xmlrpclib)>>> import xmlrpclib

>>> s = xmlrpclib.ServerProxy("http://localhost:8080")

>>> s.add(3,5)

8

>>> s.add("Hello","World")

"HelloWorld"

>>>

295

Page 300: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Simple XML-RPC• Adding multiple functions

21

from SimpleXMLRPCServer import SimpleXMLRPCServer

s = SimpleXMLRPCServer(("",8080))

s.register_function(add)

s.register_function(foo)

s.register_function(bar)

s.serve_forever()

• Registering an instance (exposes all methods)from SimpleXMLRPCServer import SimpleXMLRPCServer

s = SimpleXMLRPCServer(("",8080))

obj = SomeObject()

s.register_instance(obj)

s.serve_forever()

Copyright (C) 2008, http://www.dabeaz.com 12-

Exercise 12.2

22

Time : 15 Minutes

296

Page 301: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Module

• A library module that allows C functions to be executed in arbitrary shared libraries/DLLs

• One of several approaches that can be used to access foreign C functions

23

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Example• Consider this C code:

24

int fact(int n) {

if (n <= 0) return 1;

return n*fact(n-1);

}

int cmp(char *s, char *t) {

return strcmp(s,t);

}

double half(double x) {

return 0.5*x;

}

• Suppose it was compiled into a shared lib

% cc -shared example.c -o libexample.so

297

Page 302: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Example

• Using C types

25

>>> import ctypes

>>> ex = ctypes.cdll.LoadLibrary("./libexample.so")

>>> ex.fact(4)

24

>>> ex.cmp("Hello","World")

-1

>>> ex.cmp("Foo","Foo")

0

>>>

• It just works (heavy wizardry)

• However, there is a catch...

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Caution

• C libraries don't contain type information

• So, ctypes has to guess...

26

>>> import ctypes

>>> ex = ctypes.cdll.LoadLibrary("./libexample.so")

>>> ex.fact("Howdy")

1

>>> ex.cmp(4,5)

Segmentation Fault

• And unfortunately, it usually gets it wrong

• However, you can help it out.

298

Page 303: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Types

• You just have to provide type signatures

27

>>> ex.half.argtypes = (ctypes.c_double,)

>>> ex.half.restype = ctypes.c_double

>>> ex.half(5.0)

2.5

>>>

• Creates a minimal prototype

.argtypes # Tuple of argument types

.restype # Return type of a function

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Types• Sampling of datatypes available

28

ctypes type C Datatype

------------------ ---------------------------

c_byte signed char

c_char char

c_char_p char *

c_double double

c_float float

c_int int

c_long long

c_longlong long long

c_short short

c_uint unsigned int

c_ulong unsigned long

c_ushort unsigned short

c_void_p void *

299

Page 304: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes Cautions

• Requires detailed knowledge of underlying C library and how it operates

• Function names

• Argument types and return types

• Data structures

• Side effects/Semantics

• Memory management

29

Copyright (C) 2008, http://www.dabeaz.com 12-

ctypes and C++

• Not really supported

• This is more the fault of C++

• C++ creates libraries that aren't easy to work with (non-portable name mangling, vtables, etc.)

• C++ programs may use features not easily mapped to ctypes (e.g., templates, operator overloading, smart pointers, RTTI, etc.)

30

300

Page 305: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Extension Commentary

• There are more advanced ways of extended Python with C and C++ code

• Low-level extension API

• Code generator tools (e.g., Swig, Boost, etc.)

• More details in an advanced course

31

Copyright (C) 2008, http://www.dabeaz.com 12-

Exercise 12.3

32

Time : 20 minutes

301

Page 306: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

COM Extensions

• On Windows, Python can interact with COM

• Allows Python to script applications on Windows (e.g., Microsoft Office)

33

Copyright (C) 2008, http://www.dabeaz.com 12-

Pythonwin and COM

• To work with COM, use Pythonwin extensions

• An extension to Python that includes additional modules for Windows

• A separate download

34

http://sourceforge.net/projects/pywin32

• The book: "Python Programming on Win32", by Mark Hammond and Andy Robinson (O'Reilly)

302

Page 307: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Disclaimer

• Covering all of COM is impossible here

• There are a lot of tricky details

• Will provide a general idea of how it works in Python

• Consult a reference for gory details

35

Copyright (C) 2008, http://www.dabeaz.com 12-

Python COM Client

• Example : Control Microsoft Word

• First step: Get a COM object for Word

>>> import win32com.client

>>> w = win32com.client.Dispatch("Word.Application")

>>> w

<COMObject Word.Application>

>>>

36

303

Page 308: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Python as a client>>> import win32com.client

>>> w = win32com.client.Dispatch("Word.Application")

>>> w

<COMObject Word.Application>

>>>

37

This refers to anobject in the registry

Copyright (C) 2008, http://www.dabeaz.com 12-

Using a COM object

• Once you have a COM object, you access its attributes to do things

• Example:

>>> doc = w.Documents.Add()

>>> doc.Range(0,0).Select()

>>> sel = w.Selection

>>> sel.InsertAfter("Hello World\n")

>>> sel.Style = "Heading 1"

>>> sel.Collapse(0)

>>> sel.InsertAfter("This is a test\n")

>>> sel.Style = "Normal"

>>> doc.SaveAs("HelloWorld")

>>> w.Visible = True

>>>

38

304

Page 309: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Sample Output

39

Copyright (C) 2008, http://www.dabeaz.com 12-

Commentary

• Obviously, there are many more details

• Must know method names/arguments of objects you're using

• May require Microsoft documentation

• Python IDEs (IDLE) may not help much

40

305

Page 310: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Jython

• A pure Java implementation of Python

• Can be used to write Python scripts that interact with Java classes and libraries

• Official Location:

41

http://www.jython.org

Copyright (C) 2008, http://www.dabeaz.com 12-

Jython Example

• Jython runs like normal Python

42

% jython

Jython 2.2.1 on java1.5.0_13

Type "copyright", "credits" or "license" for more

information.

>>>

• And it works like normal Python

>>> print "Hello World"

Hello World

>>> for i in xrange(5):

... print i,

...

0 1 2 3 4

>>>

306

Page 311: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Jython Example

• Many standard library modules work

43

>>> import urllib

>>> u = urllib.urlopen("http://www.python.org")

>>> page = u.read()

>>>

• And you get access to Java libraries

>>> import java.util

>>> d = java.util.Date()

>>> d

Sat Jul 26 13:31:49 CDT 2008

>>> dir(d)

['UTC', '__init__', 'after', 'before', 'class',

'clone', 'compareTo', 'date', 'day', 'equals',

'getClass', 'getDate', ... ]

>>>

Copyright (C) 2008, http://www.dabeaz.com 12-

Jython Example• A Java Class

44

public class Stock {

public String name;

public int shares;

public double price;

public Stock(String n, int s, double p) {

! name = n;

! shares = s;

! price = p;

}

public double cost() { return shares*price; }

};

• In Jython>>> import Stock

>>> s = Stock('GOOG',100,490.10)

>>> s.cost()

49010.0

>>>

307

Page 312: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

IronPython

• Python implemented in C#/.NET

• Can be used to write Python scripts that control components in .NET

• Official Location:

45

http://www.codeplex.com/IronPython

Copyright (C) 2008, http://www.dabeaz.com 12-

IronPython Example

• IronPython runs like normal Python

46

% ipy

IronPython 1.1.2 (1.1.2) on .NET 2.0.50727.42

Copyright (c) Microsoft Corporation. All rights

reserved.

>>>

• And it also works like normal Python>>> print "Hello World"

Hello World

>>> for i in xrange(5):

... print i,

...

0 1 2 3 4

>>>

308

Page 313: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

IronPython Example

• You get access to .NET libraries

47

>>> import System.Math

>>> dir(System.Math)

['Abs', 'Acos', 'Asin', 'Atan', 'Atan2', 'BigMul',

'Ceiling', 'Cos', 'Cosh', ...]

>>> System.Math.Cos(3)

-0.9899924966

>>>

• Can script classes written in C#

• Same idea as with Jython

Copyright (C) 2008, http://www.dabeaz.com 12-

Commentary

• Jython and IronPython have limitations

• Features lag behind those found in the main Python distribution (for example, Jython doesn't have generators)

• Small set of standard library modules work (many of the standard libraries depend on C code which isn't supported)

• No reason to use unless you are working with Java or .NET

48

309

Page 314: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Some Final Words

• Integration is the real reason why Python is used by smart programmers and why they consider it to be a "secret weapon"

• With Python, you don't give up your code in C, C++, Java, etc.

• Instead, Python makes that code better

• Python makes difficult problems easier and impossible problems more feasible.

49

Copyright (C) 2008, http://www.dabeaz.com 12-

Intro Course Wrap-up

• First 12 sections of this class are just the start

• Basics of the Python language

• How to work with data

• How to organize Python software

• How Python interacts with the environment

• How it interacts with other programs

50

310

Page 315: Python Binder

Copyright (C) 2008, http://www.dabeaz.com 12-

Further Topics

• Network/Internet Programming (Day 4)

• Graphical User Interfaces

• Frameworks and Components

• Functional Programming

• Metaprogramming

• Extension programming

51

311

Page 316: Python Binder

Python Slide IndexSymbols and Numbers

!= operator, 1-40#, program comment, 1-32%, floating point modulo operator, 1-54%, integer modulo operator, 1-50%, string formatting operator, 2-25&, bit-wise and operator, 1-50&, set intersection operator, 2-22() operator, 5-29(), tuple, 2-6**, power operator, 1-54*, list replication, 1-69*, multiply operator, 1-50, 1-54*, sequence replication operator, 2-30*, string replication operator, 1-60+, add operator, 1-50, 1-54+, list concatenation, 1-67+, String concatenation, 1-59, 9-24-, set difference operator, 2-22-, subtract operator, 1-50, 1-54-i option, Python interpreter, 7-26-O option, Python interpreter, 7-22, 7-23. operator, 5-29... interpreter prompt, 1-15.NET framework, and IronPython, 12-45.pyc files, 4-11.pyo files, 4-11/ division operator, 1-54/, division operator, 1-50//, floor division operator, 1-50, 1-51< operator, 1-40<<, left shift operator, 1-50<= operator, 1-40== operator, 1-40> operator, 1-40>= operator, 1-40>>, right shift operator, 1-50>>> interpreter prompt, 1-15[:] string slicing,, 1-59[:], slicing operator, 2-31[], sequence indexing, 2-29[], string indexing, 1-59\ Line continuation character, 1-43^, bit-wise xor operator, 1-50

{}, dictionary, 2-13|, bit-wise or operator, 1-50|, set union operator, 2-22~, bit-wise negation operator, 1-50

A

ABC language, 1-5abs() function, 1-50, 1-54, 4-24Absolute value function, 1-50, 1-54__abs__() method, 5-26Accessor methods, 6-30Adding items to dictionary, 2-14__add__() method, 5-26Advanced String Formatting, 9-31and operator, 1-40__and__() method, 5-26Anonymous functions, 3-45append() method, of lists, 1-67Argument naming style, 3-20Arguments, passing mutable objects, 3-25Arguments, transforming in function, 3-22argv variable, sys module, 4-28array module, 10-24array module typecodes, 10-25array object I/O functions, 10-27array object operations, 10-26arrays vs. lists, 10-28arrays, creating with ctypes, 10-20assert statement, 7-20assert statement, stripping in -O mode, 7-22assertEqual() method, unittest module, 7-15AssertionError exception, 7-20Assertions, and unit testing, 7-15assertNotEqual() method, unittest module, 7-15assertRaises() method, unittest module, 7-15assert_() method, unittest module, 7-15Assignment, copying, 2-65Assignment, reference counting, 2-62, 2-64, 2-65Associative array, 2-13Attribute access functions, 5-31Attribute binding, 6-12Attribute lookup, 6-17, 6-20Attribute, definition of, 5-3Attributes, computed using properties, 6-33, 6-36, 6-37Attributes, modifying values, 6-13Attributes, private, 6-27Awk, and list compreheneions, 2-57

Page 317: Python Binder

B

Base class, 5-11base-10 decimals, 4-58__bases__ attribute of classes, 6-19binary arrays, 10-24binary data, 10-4binary data representation, 10-6Binary files, 10-9binary type objects, 10-16Binding of attributes in objects, 6-12Block comments, 1-32Boolean type, 1-46, 1-47Booleans, and integers, 1-47Boost Python, 12-31Bottom up programming style, 3-11break statement, 2-36Breakpoint, debugger, 7-29Breakpoint, setting in debugger, 7-31Built-in exceptions, 3-52__builtins__ module, 4-23bytearray type, 10-8bytes literals, 10-7

C

C extension, example with ctypes, 12-25C extensions, accessing with ctypes module, 12-23C Extensions, other tools, 12-31C3 Linearization Algorithm, 6-23Callback functions, 3-44Calling a function, 1-76Calling other methods in the same class, 5-9Capturing output of a subprocess, 11-17Case conversion, 1-61Catching exceptions, 1-79Catching multiple exceptions, 3-54chr() function, 4-25Class implementation chart, 6-11class statement, 5-4class statement, defining methods, 5-8Class,, 6-9Class, representation of, 6-9Class, __slots__ attribute of, 6-38__class__ attribute of instances, 6-10close() function, shelve module, 4-46Code blocks and indentation, 1-37Code formatting, 1-43

Code reuse, and generators, 8-30collect() function, gc module, 6-45collections module, 4-60Colon, and indentation, 1-37COM, 12-33, 12-34COM, example of controlling Microsoft Word, 12-38COM, launching Microsoft Word, 12-36Command line arguments, manual parsing, 4-29Command line options, 4-28Command line options, parsing with optparse, 4-30Command line, running Python, 1-24Comments, 1-32, 7-3Comments, vs. documentation strings, 7-3Community links, 1-2Compiling regular expressions, 9-13Complex type, 1-46complex() function, 4-25Computed attributes, 6-33, 6-36, 6-37Concatenation of strings, 1-59Concatenation, lists, 1-67Concatenation, of sequences, 2-30Conditionals, 1-39ConfigParser module, 4-48Conformance checking of functions, 3-41Container, 2-17Containers, dictionary, 2-19Containers, list, 2-18__contains__() method, 5-25continue statement, 2-36Contract programming, 7-21Conversion of numbers, 1-55Converting to strings, 1-64copy module, 2-68, 4-35copy() function, copy module, 4-35copy() function, shutil module, 4-42Copying and moving files, 4-42copytree() function, shutil module, 4-42cos() function, math module, 1-54cPickle module, 4-44Creating new objects, 5-4Creating programs, 1-19ctime() function, time module, 4-41ctypes direct I/O support, 10-22ctypes library, 10-17ctypes module, and C++, 12-30ctypes module, example of, 12-25ctypes module, limitations of, 12-29ctypes module, specifying type signatures, 12-27ctypes module, supported datatypes, 12-28

Page 318: Python Binder

ctypes modules, 12-23ctypes, creating a C datatype instance, 10-19ctypes, creating arrays, 10-20ctypes, list of datatypes, 10-18ctypes, structures, 10-21

D

Data interchange, 12-4Data structure, dictionary, 6-3Data structures, 2-5Database like queries on lists, 2-55Database, interface to, 4-61Database, SQL injection attack risk, 4-63Datatypes, in library modules, 4-57Date and time manipulation, 4-59datetime module, 4-59Debugger, 7-27Debugger, breakpoint, 7-29, 7-31Debugger, commands, 7-29Debugger, launching inside a program, 7-28Debugger, listing source code, 7-31Debugger, running at command line, 7-33Debugger, running functions, 7-32Debugger, single step execution, 7-32Debugger, stack trace, 7-30__debug__ variable, 7-23decimal module, 4-58Declarative programming, 2-58Deep copies, 2-68deepcopy() function, copy module, 2-68, 4-35def statement, 1-76, 3-8Defining a function, 3-8Defining new functions, 1-76Defining new objects, 5-4Definition order, 3-7del operator, lists, 1-70delattr() function, 5-31Deleting items from dictionary, 2-14__delitem__() method, 5-24__del__() method, 6-46, 6-47deque object, collections module, 4-60Derived class, 5-11Design by contract, 7-21Destruction of objects, 6-43Dictionary, 2-13Dictionary, and class representation, 6-9Dictionary, and local function variables, 6-4Dictionary, and module namespace, 6-5

Dictionary, and object representation, 6-6Dictionary, creating from list of tuples, 2-21Dictionary, persistent, 4-46Dictionary, testing for keys, 2-20Dictionary, updating and deleting, 2-14Dictionary, use as a container, 2-19Dictionary, use as data structure, 6-3Dictionary, using as function keyword arguments, 3-40Dictionary, when to use, 2-15__dict__ attribute, of instances, 6-7, 6-8__dict__ variable, of modules, 4-14dir() function, 1-81, 1-82direct I/O, with ctypes objects', 10-22Directory listing, 4-38disable() function, gc module, 6-45divmod() function, 1-50, 1-54, 4-24__div__() method, 5-26doctest module, 3-48, 7-7, 7-8doctest module, self-testing, 7-10Documentation, 1-17Documentation strings, 3-46, 7-3Documentation strings, and help() command, 7-6Documentation strings, and help() function, 3-47Documentation strings, and IDEs, 7-5Documentation strings, and testing, 3-48, 7-7Documentation strings, vs. comments, 7-3Double precision float, 1-52Double-quoted string, 1-56Downloads, 1-2dumps() function, pickle module, 4-45Duplicating containers, 2-66

E

easy_install command, 4-71Edit,compile,debug cycle, 1-13Eggs, Python package format, 4-71ElementTree module, xml.etree package, 12-7elif statement, 1-39else statement, 1-39Embedded nulls in strings, 1-58empty code blocks, 1-42enable() function, gc module, 6-45Enabling future features, 1-51Encapsulation, 6-24Encapsulation, and accessor methods, 6-30Encapsulation, and properties, 6-31Encapsulation, challenges of, 6-25Encapsulation, uniform access principle, 6-35

Page 319: Python Binder

end() method, of Match objects, 9-16endswith() method, strings, 1-62enumerate() function, 2-39, 2-40, 4-26environ variable, os module, 4-37Environment variables, 4-37Error reporting strategy in exceptions, 3-57Escape codes, strings, 1-57Event loop, and GUI programming, 4-54except statement, 1-79, 3-50Exception base class, 5-28Exception, defining new, 5-28Exception, printing tracebacks, 7-25Exception, uncaught, 7-24Exceptions, 1-78, 3-50Exceptions, catching, 1-79Exceptions, catching any, 3-55Exceptions, catching multiple, 3-54Exceptions, caution on use, 3-56Exceptions, finally statement, 3-58Exceptions, how to report errors, 3-57Exceptions, ignoring, 3-55Exceptions, list of built-in, 3-52Exceptions, passed value, 3-53Exceptions, propagation of, 3-51Executing system commands, 11-5, 11-8Execution model, 1-31Execution of modules, 4-5exists() function, os.path module, 4-40exit() function, sys module, 3-59Exploding heads, and exceptions, 3-56Exponential notation, 1-52Extended slicing of sequences, 2-32

F

False Value, 1-47File globbing, 4-38File system, copying and moving files, 4-42File system, getting a directory listing, 4-38File tests, 4-40File, binary, 10-9File, obtaining metadata, 4-41Files, and for statement, 1-73Files, opening, 1-72Files, reading line by line, 1-73__file__ attribute, of modules, 4-7Filtering sequence data, 2-53finally statement, 3-58find() method of strings, 9-6

find() method, strings, 1-62Finding a substring, 9-6finditer() method, of regular expressions, 9-19First class objects, 2-69Float type, 1-46, 1-52float() function, 1-55, 4-25Floating point numbers, 1-52Floating point, accuracy, 1-53Floor division operator, 1-51__floordiv__() method, 5-26For loop, and tuples, 2-41For loop, keeping a loop counter, 2-39for statement, 2-34for statement, and files, 1-73for statement, and generators, 8-17for statement, and iteration, 8-3for statement, internal operation of, 8-4for statement, iteration variable, 2-35Format codes, string formatting, 2-26Format codes, struct module, 10-12format() method of strings, 9-31Formatted output, 2-24format_exc() function, traceback module, 7-25from module import *, 4-18from statement, 4-17from __future__ import, 1-51Function, and generators, 8-17Function, running in debugger, 7-32Functions, 3-8, 3-9Functions, accepting any combination of arguments, 3-39Functions, anonymous with lambda, 3-45Functions, argument passing, 3-15Functions, benefits of using, 3-8Functions, Bottom-up style, 3-11Functions, calling with positional arguments, 3-17Functions, checking of return statement, 3-43Functions, conformance checking, 3-41Functions, default arguments, 3-16Functions, defining, 1-76Functions, definition order, 3-10Functions, design of, 3-14Functions, design of and global variables, 3-32Functions, design of and side effects, 3-26Functions, design of and transformation of inputs, 3-22Functions, design of argument names, 3-20Functions, design of input arguments, 3-21Functions, documentation strings, 3-46Functions, global variables, 3-29, 3-30

Page 320: Python Binder

Functions, keyword arguments, 3-18Functions, local variables, 3-28Functions, mixing positional and keyword arguments, 3-19Functions, multiple return values, 3-24Functions, overloading (lack of), 3-12Functions, side effects, 3-25Functions, tuple and dictionary expansion, 3-40Functions, variable number of arguments, 3-35, 3-36, 3-37, 3-38

G

Garbage collection, and cycles, 6-44, 6-45Garbage collection, and __del__() method, 6-46, 6-47Garbage collection, reference counting, 6-43gc module, 6-45Generating text, 9-23Generator, 8-17, 8-18Generator expression, 8-24, 8-25Generator expression, efficiency of, 8-27Generator tricks presentation, 8-33Generator vs. iterator, 8-22Generator, and code reuse, 8-30Generator, and StopIteration exception, 8-20Generator, example of following a file, 8-21Generator, use of, 8-29getatime() function,, 4-41getattr() function, 5-31__getitem__() method, 5-24getmtime() function, os.path module, 4-41getoutput() function, commands module, 11-5getsize() function, os.path module, 4-41Getting a list of symbols, 1-81Getting started, 1-8glob module, 4-38global statement, 3-31Global variables, 3-27, 4-6Global variables, accessing in functions, 3-29Global variables, modifying inside a function, 3-30group() method, of Match objects, 9-16Groups, extracting text from in regular expressions, 9-18Groups, in regular expressions, 9-17GUI programming, event loop model, 4-54GUI programming, with Tkinter, 4-51Guido van Rossum, 1-3

H

hasattr() function, 5-31Hash table, 2-13Haskell, 1-5Haskell, list comprehension, 2-56has_key() method, of dictionaries, 2-20Heavy wizardry, 6-42help() command, 1-17help() command, and documentation strings, 7-6help() function, and documentation strings, 3-47hex() function, 4-25HTTP, simple web server, 12-18

I

I/O redirection, subprocess module, 11-21Identifiers, 1-33IDLE, 1-10, 1-11, 1-16IDLE, creating new program, 1-20, 1-21IDLE, on Mac or Unix, 1-12IDLE, running programs, 1-23IDLE, saving programs, 1-22IEEE 754, 1-52if statement, 1-39Ignoring an exception, 3-55Immutable objects, 1-63import statement, 1-77, 4-3import statement, creation of .pyc and .pyo files, 4-11import statement, from modifier, 4-17import statement, importing all symbols, 4-18import statement, proper use of namespaces, 4-20import statement, repeated, 4-8import statement, search path, 4-9, 4-10import statement, supported file types, 4-11import, as modifier, 4-15import, use in extensible programs, 4-16importing different versions of a library, 4-16in operator, 5-25in operator, dictionary, 2-20in operator, lists, 1-69in operator, strings, 1-60Indentation, 1-36, 1-37Indentation style, 1-38index() method, strings, 1-62Indexing of lists, 1-68Infinite data streams, 8-31Inheritance, 5-11, 5-14

Page 321: Python Binder

Inheritance example, 5-13Inheritance, and object base, 5-16Inheritance, and polymorphism, 5-19Inheritance, and __init__() method, 5-17Inheritance, implementation of, 6-19, 6-21Inheritance, multiple, 5-20, 6-23Inheritance, multiple inheritance, 6-22Inheritance, organization of objects, 5-15Inheritance, redefining methods, 5-18Inheritance, uses of, 5-12INI files, parsing of, 4-48Initialization of objects, 5-6__init__() method, 6-41__init__() method in classes, 5-6__init__() method, and inheritance, 5-17insert() method, of lists, 1-67Inspecting modules, 1-81Inspecting objects, 1-82Instance data, 5-7Instances, and __class__ attribute, 6-10Instances, creating new, 5-5Instances, modifying after creation, 6-15Instances, representation of, 6-7, 6-8int() function, 1-55, 4-25Integer division, 1-51Integer type, 1-46Integer type, operations, 1-50Integer type, precision of, 1-48Integer type, promotion to long, 1-49Interactive mode, 1-13, 1-14, 1-15Interactive subprocesses, 11-24Interpreter execution, 11-3Interpreter prompts, 1-15__invert__() method, 5-26IronPython, 12-45is operator, 2-62, 2-64isalpha() method, strings, 1-62isdigit() method, strings, 1-62isdir() function, os.path module, 4-39, 4-40isfile() function, os.path module, 4-39, 4-40isinstance() function, 2-71islower() method, strings, 1-62Item access methods, 5-24Iterating over a sequence, 2-34Iteration, 8-3Iteration protocol, 8-4Iteration variable, for loop, 2-35Iteration, design of iterator classes, 8-14Iteration, user defined, 8-8, 8-9

Iterator vs. generator, 8-22itertools module, 8-32__iter__() method, 8-4

J

Java, and Jython, 12-41join() function, os.path module, 4-39join() method, of strings, 9-25join() method, strings, 1-62Jython, 12-41

K

key argument of sort() method, 2-48KeyboardInterrupt exception, 3-59Keys, dictionary, 2-13Keyword arguments, 3-18Keywords, 1-34, 1-35kill()function, os module, 11-15

L

lambda statement, 3-45Lazy evaluation, 8-31len() function, 2-29, 5-24len() function, lists, 1-69len() function, strings, 1-60__len__() method, 5-24Library modules, 1-77Life cycle of objects, 6-40Line continuation, 1-43Lisp, 1-5List, 2-29List comprehension, 2-52, 2-53, 2-54List comprehension uses, 2-55List comprehensions and awk, 2-57List concatenation, 1-67List processing, 2-51List replication, 1-69List type, 1-67List vs. Tuple, 2-12list() function, 2-66, 2-67List, extra memory overhead, 2-12List, Looping over items, 2-34List, sorting, 2-46List, use as a container, 2-18listdir() function, os module, 4-38

Page 322: Python Binder

lists vs. arrays, 10-28Lists, changing elements, 1-68Lists, indexing, 1-68Lists, removing items, 1-70Lists, searching, 1-69loads() function, pickle module, 4-45Local variables, 3-27Local variables in functions, 3-28log() function, math module, 1-54Long type, 1-46, 1-49Long type, use with integers, 1-49long() function, 1-55, 4-25Looping over integers, 2-37Looping over items in a sequence, 2-34Looping over multiple sequences, 2-42lower() method, strings, 1-61, 1-62__lshift__() method, 5-26lstrip() method, of strings, 9-5

M

Main program, 4-7main() function, unittest module, 7-16__main__, 4-7__main__ module, 7-10Match objects, re module, 9-16match() method, of regular expression patterns, 9-14Matching a regular expression, 9-14math module, 1-54, 1-77, 4-33Math operators, 5-26Math operators, floating point, 1-54Math operators, integer, 1-50max() function, 2-33, 4-24Memory efficiency, and generators, 8-31Memory management, reference counting, 2-62, 2-64Memory use of tuple and list, 2-12Method invocation, 5-29Method, definition of, 5-3Methods, calling other methods in the same class, 5-9Methods, in classes, 5-8Methods, private, 6-28min() function, 2-33, 4-24Modules, 4-3modules variable, sys module, 4-8Modules, as object, 4-13Modules, dictionary, 4-14Modules, execution of, 4-5Modules, loading of, 4-8Modules, namespaces, 4-4

Modules, search path, 4-9, 4-10Modules, self-testing with doctest, 7-10__mod__() method, 5-26move() function, shutil module, 4-42__mro__ attribute, of classes, 6-23Multiple inheritance, 5-20, 6-22, 6-23__mul__() method, 5-26

N

Namespace, 4-14Namespaces, 4-4, 4-20__name__ attribute, of modules, 4-7Naming conventions, Python's reliance upon, 6-26Negative indices, lists, 1-68__neg__() method, 5-26Network programming introduction, 12-15Network programming, example of sockets, 12-16__new__() method, 6-41, 6-42next() method, and iteration, 8-4next() method, of generators, 8-19no-op statement, 1-42None type, 2-4None type, returned by functions, 3-23not operator, 1-40Null value, 2-4Numeric conversion, 1-55Numeric datatypes, 1-46numpy, 10-29

O

object base class, 5-16Object oriented programming, 5-3Object oriented programming, and encapsulation, 6-24Objects, attribute binding, 6-17, 6-20Objects, attributes of, 5-7Objects, creating containers, 5-24Objects, creating new instances, 5-5Objects, creating private attributes, 6-27Objects, creation steps, 6-41Objects, defining new, 5-4Objects, first class behavior, 2-69Objects, inheritance, 5-11Objects, invoking methods, 5-5Objects, life cycle, 6-40Objects, making a deep copy of, 2-68Objects, memory management of, 6-43

Page 323: Python Binder

Objects, method invocation, 5-29Objects, modifying attributes of instances, 6-13Objects, modifying instances, 6-15Objects, multiple inheritance, 6-22Objects, reading attributes, 6-16Objects, representation of, 6-11Objects, representation of instances, 6-7, 6-8Objects, representation with dictionary, 6-6Objects, saving with pickle, 4-44Objects, serializing into a string, 4-45Objects, single inheritance, 6-21Objects, special methods, 5-22Objects, type checking, 2-71Objects, type of, 2-63oct() function, 4-25Old-style classes, 5-16Online help, 1-17open() function, 1-72open() function, shelve module, 4-46, 4-47Optimized mode, 7-23Optimized mode (-O), 7-22Optional features, defining function with, 3-18Optional function arguments, 3-16optparse module, 4-30or operator, 1-40ord() function, 4-25__or__() method, 5-26os module, 4-36, 11-6os.path module, 4-39, 4-41Output, print statement, 1-41Overloading, lack of with functions, 3-12

P

pack() function, struct module, 10-11Packing binary structures, 10-10, 10-11Packing values into a tuple, 2-9Parallel iteration, 2-42Parsing, 9-3pass statement, 1-42path variable, sys module, 4-9, 4-10Pattern syntax, regular expressions, 9-11pdb module, 7-27pdb module, commands, 7-29Performance of string operations, 9-8Performance statistics, profile module, 7-34Perl, difference in string handling, 1-75Perl, regular expressions and Python, 9-21Perl, string interpolation and Python, 9-28

Persistent dictionary, 4-46pexpect library, 11-24pickle module, 4-44pickle module, and strings, 4-45Pipelines, and generators, 8-29Pipes, subprocess module, 11-18poll() method, Popen objects, 11-14Polymorphism, and inheritance, 5-19Popen() function, subprocess module, 11-8popen2 module, 11-6Positional function arguments, 3-17Post-assertion, 7-21pow() function, 1-50, 1-54, 4-24Powers of numbers, 1-50__pow__() method, 5-26Pre-assertion, 7-21Primitive datatypes, 2-3print statement, 1-41, 4-32, 5-23print statement, and files, 1-72print statement, and str(), 1-64print statement, trailing comma, 1-41print, formatted output, 2-25print_exc() function, traceback module, 7-25Private attributes, 6-27Private attributes, performance of name mangling, 6-29Private methods, 6-28profile module, 7-34Profiling, 7-34Program exit, 3-59Program structure, 3-6Program structure, definition order, 3-7Propagation of exceptions, 3-51Properties, and encapsulation, 6-31py files, 1-19Python Documentation, 1-17Python eggs packages, 4-71Python influences, 1-5Python interpreter, 1-13Python interpreter, keeping alive after execution, 7-26Python interpreter, optimized mode, 7-22, 7-23Python package index, 4-67Python, extending with ctypes, 12-23Python, reason created, 1-4Python, running on command line, 1-24Python, source files, 1-19Python, starting on Mac, 1-12Python, starting on Unix, 1-12Python, starting on Windows, 1-11Python, starting the interpreter, 1-9

Page 324: Python Binder

Python, statement execution, 1-31Python, uses of, 1-6, 1-7Python, year created, 1-3python.org website, 1-2Pythonwin extension, 12-33, 12-34

R

raise Statement, 1-80raise statement, 3-50Raising exceptions, 1-80random module, 4-34Random numbers, 4-34range() function, 2-38, 4-26range() vs. xrange(), 2-38Raw strings, 1-57Raw strings, and regular expressions, 9-12re module, 9-10re module, compile() function, 9-13re module, find all occurrences of a pattern, 9-19re module, pattern syntax, 9-11Read-eval loop, 1-14Reading attributes on objects, 6-16readline() method, files, 1-72Redefining methods with inheritance, 5-18Redefining output file, 4-32Redirecting print to a file, 1-72Reference counting, 2-62, 2-64, 2-67, 6-43Reference counting, containers, 2-66register_function() method, of SimpleXMLRPCServer, 12-21register_instance() method, of SimpleXMLRPCServer, 12-21Regular expression syntax, 9-11Regular expressions (see re), 9-10Regular expressions, and Perl, 9-21Regular expressions, compiling patterns, 9-13Regular expressions, extracting text from numbered groups, 9-18Regular expressions, Match objects, 9-16Regular expressions, matching, 9-14Regular expressions, numbered groups, 9-17Regular expressions, obtaining matching text, 9-16Regular expressions, pattern replacement, 9-20Regular expressions, searching, 9-15Relational database, interface to, 4-61Relational operators, 1-40remove() method, lists, 1-70Repeated function definitions, 3-12

Repeated imports, 4-8replace() method of strings, 9-7replace() method, strings, 1-61, 1-62Replacing a substring, 9-7Replacing text, 1-61Replication of sequences, 2-30repr() function, 4-25, 5-23Representation of strings, 1-58representing binary data, 10-6__repr__() method, 5-23Reserved names, 1-34, 1-35return statement, 3-23return statement, multiple values, 3-24returncode attribute, Popen objects, 11-13reversed() function, 4-26rfind() method, strings, 1-62rindex() method, strings, 1-62rmtree() function, shutil module, 4-42round() function, 4-24Rounding errors, floating point, 1-53__rshift__() method, 5-26rsplit() method of strings, 9-4rstrip() method, of strings, 9-5Ruby, string interpolation and Python, 9-28run() function, pdb module, 7-33runcall() function, pdb module, 7-33runeval() function, pdb module, 7-33Running Python, 1-9Runtime error vs. compile-time error, 3-42

S

safe_substitute() method, of Template objects, 9-30Sample Python program, 1-26Scope of iteration variable in loops, 2-35Scripting, 3-3Scripting language, 1-3Scripting, defined, 3-4Scripting, problem with, 3-5search() method, of regular expression patterns, 9-15Searching for a regular expression, 9-15self parameter of methods, 5-7, 5-8Sequence, 2-29Sequence sorting, 2-49Sequence, concatenation, 2-30Sequence, extended slicing, 2-32Sequence, indexing, 2-29Sequence, length, 2-29Sequence, looping over items, 2-34

Page 325: Python Binder

Sequence, replication, 2-30Sequence, slicing, 2-31Sequence, string, 1-58Serializing objects into strings, 4-45Set theory, list comprehension, 2-56Set type, 2-22set() function, 2-22setattr() function, 5-31__setitem__() method, 5-24setUp() method, unittest module, 7-17setup.py file, third party modules, 4-70setuptools module, 4-71set_trace() function, pdb module, 7-28Shallow copy, 2-67Shell operations, 4-42shelve module, 4-46shutil module, 4-42Side effects, 3-25SimpleHTTPServer module, 12-18SimpleXMLRPCServer module, 12-20sin() function, math module, 1-54Single-quoted string, 1-56Single-step execution, 7-32Slicing operator, 2-31__slots__ attribute of classes, 6-38socket module, example of, 12-16SocketServer module, example of, 12-17sort() method of lists, 3-44sort() method, of lists, 2-46, 2-48sorted() function, 2-49, 4-26Sorting lists, 2-46Sorting with a key function, 2-48Sorting, of any sequence, 2-49Source files, 1-19Source files, and modules, 4-3Source listing, in debugger, 7-31span) method, of Match objects, 9-16Special methods, 1-82, 5-22split() method, of strings, 9-4split() method, strings, 1-62, 1-66Splitting strings, 9-4Splitting text, 1-66SQL queries , how to form, 4-63SQL queries, injection attack risk, 4-63SQL queries, value substitutions in, 4-65sqlite3 module, 4-62sqrt() function, math module, 1-54Stack trace, in debugger, 7-30Standard I/O streams, 4-32

Standard library, 4-22start() method, of Match objects, 9-16startswith() method, strings, 1-62Statements, 1-31, 3-6Status codes, subprocesses, 11-12stderr variable, sys module, 4-32stdin variable, sys module, 4-32stdout variable, sys module, 4-32StopIteration exception, 8-5StopIteration exception, and generators, 8-20str() function, 1-64, 4-25, 5-23String concatenation, performance of, 9-24String format codes, 2-26String formatting, 2-25String formatting, with dictionary, 2-27, 9-29String interpolation, 2-27, 9-28String joining, 9-25String joining vs. concatenation, 9-26String manipulation, performance of, 9-8String replacement, 9-7String searching, 9-6String splitting, 9-4String stripping, 9-5String templates, 9-30String type, 2-29StringIO objects, 9-27Strings, concatenation, 1-59Strings, conversion to, 1-64Strings, conversion to numbers, 1-55, 1-75Strings, escape codes, 1-56, 1-57Strings, immutability, 1-63Strings, indexing, 1-59Strings, length of, 1-60Strings, literals, 1-56Strings, methods, 1-61, 1-62Strings, raw strings, 1-57Strings, redirecting I/O to, 9-27Strings, replication, 1-60Strings, representation, 1-58Strings, searching for substring, 1-60Strings, slicing, 1-59Strings, splitting, 1-66Strings, triple-quoted, 1-56Strings, use of raw strings, 9-12strip() method, of strings, 9-5strip() method, strings, 1-61, 1-62Stripping characters, 1-61Stripping characters from strings, 9-5struct module, 10-11

Page 326: Python Binder

struct module, format modifiers, 10-13struct module, packing codes, 10-12structures, creating with ctypes, 10-21sub() method of regular expression patterns, 9-20Subclass, 5-11Subprocess, 11-4subprocess module, 11-6, 11-7subprocess module, capturing output, 11-17subprocess module, capturing stderr, 11-20subprocess module, changing the working directory, 11-11subprocess module, collecting return codes, 11-13subprocess module, I/O redirection, 11-21subprocess module, interactive subprocesses, 11-24subprocess module, pipes, 11-18subprocess module, polling, 11-14subprocess module, sending/receiving data, 11-18subprocess module, setting environment variables, 11-10Subprocess status code, 11-12subprocess, killing with a signal, 11-15substitute() method, of Template objects, 9-30__sub__() method, 5-26sum() function, 2-33, 4-24Superclass, 5-11Swig, 12-31sys module, 4-27System commands, executing as subprocess, 11-8system() function, os module, 4-36, 11-5SystemExit exception, 3-59

T

tan() function, math module, 1-54TCP, server example, 12-16tearDown() method, unittest module, 7-17Template strings, 9-30TestCase class, of unittest module, 7-13Testing files, 4-40testmod() function, doctest module, 7-8Text parsing, 9-3Text replacement, 1-61Text strings, 1-58Third party modules, 4-67, 4-68Third party modules, and C/C++ code, 4-72Third party modules, eggs format, 4-71Third party modules, native installer, 4-69Third party modules, setup.py file, 4-70time module, 4-41

Tkinter module, 4-51Tkinter module, sample widgets, 4-53traceback module, 7-25Traceback, uncaught exception, 7-24Tracebacks, 1-80Triple-quoted string, 1-56True value, 1-47Truncation, integer division, 1-51Truth values, 1-40try statement, 1-79, 3-50Tuple, 2-6, 2-29Tuple vs. List, 2-12Tuple, immutability of, 2-8Tuple, packing values, 2-9Tuple, unpacking in for-loop, 2-41Tuple, unpacking values, 2-10Tuple, use of, 2-7Tuple, using as function arguments, 3-40Type checking, 2-71Type conversion, 1-75Type of objects, 2-63type() function, 2-63, 2-71typecodes, array module, 10-25Types, 1-33Types, numeric, 1-46Types, primitive, 2-3

U

Unbound method, 5-30Uniform access principle, 6-35Unit testing, 7-12unittest module, 7-12, 7-13unittest module, example of, 7-14unittest module, running tests, 7-16unittest module, setup and teardown, 7-17unpack() function, struct module, 10-11Unpacking binary structures, 10-10, 10-11Unpacking values from tuple, 2-10upper() method, strings, 1-61, 1-62urllib module, 1-77urlopen() function, urllib module, 1-77User-defined exceptions, 5-28

V

Value of exceptions, 3-53Variable assignment, 2-61, 2-65

Page 327: Python Binder

Variable assignment, assignment of globals in function, 3-31Variable assignment, global vs. local, 3-27Variable number of function arguments, 3-35, 3-36, 3-37, 3-38Variables, and modules, 4-6Variables, examining in debugger, 7-30Variables, names of, 1-33Variables, reference counting, 2-62, 2-64Variables, scope in functions, 3-28, 3-29Variables, type of, 1-33vars() function, 9-29

W

wait() method, Popen objects, 11-13walk() function, os module, 4-36Walking a directory of files, 4-36while statement, 1-36Widgets, and Tkinter, 4-53Windows, binary files, 10-9Windows, killing a subprocess, 11-15Windows, starting Python, 1-11write() method, files, 1-72

X

XML overview, 12-5XML parsing, extracting document elements, 12-9XML parsing, extracting element text, 12-11XML parsing, getting element attributes, 12-12XML, ElementTree module, 12-7XML, parsing with ElementTree module, 12-8XML-RPC, 12-19XML-RPC, server example, 12-20xmlrpclib module, 12-20__xor__() method, 5-26xrange() function, 2-37, 4-26xrange() vs. range(), 2-38

Z

zip() function, 2-42, 2-43, 2-44, 4-26