python binder
TRANSCRIPT
Introduction to PythonDavid M. Beazley
http://www.dabeaz.com
Edition: Thu May 28 19:33:40 2009
Copyright (C) 2009David M Beazley
All Rights Reserved
Introduction to Python : Table of Contents
Introduction to Python 0. Course Setup 1 1. Introduction 6 2. Working with Data 48 3. Program Organization and Functions 85
Objects and Software Development 4. Modules and Libraries 115 5. Classes 152 6. Inside the Python Object Model 169 7. Documentation, Testing, and Debugging 193
Python Systems Programming 8. Iterators and Generators 212 9. Working with Text 229 10. Binary Data Handling and File I/O 257 11. Working with Processes 272 12. Python Integration Primer 286
Edition: Thu May 28 19:33:40 2009
Copyright (C) 2008, http://www.dabeaz.com 0-
Course Setup
Section 0
1
Copyright (C) 2008, http://www.dabeaz.com 0-
Required Files
• Where to get Python (if not installed)
2
http://www.python.org
• Exercises for this class
http://www.dabeaz.com/python/pythonclass.zip
1
Copyright (C) 2008, http://www.dabeaz.com 0-
Setting up Your Environment• Extract pythonclass.zip on your machine
3
• This folder is where you will do your work
Copyright (C) 2008, http://www.dabeaz.com 0-
Class Exercises
• Exercise descriptions are found in
4
PythonClass/Exercises/index.html
• All exercises have solution code
Look for the link at the bottom!
2
Copyright (C) 2008, http://www.dabeaz.com 0-
Class Exercises
• Please follow the filename suggestions
5
This is the filename you should be using
• Using different names will break later exercises
Copyright (C) 2008, http://www.dabeaz.com 0-
General Tips
• We will be writing a lot of programs that access data files in PythonClass/Data
• Make sure you save your programs in the "PythonClass/" directory so that the names of these files are easy to access.
• Some exercises are more difficult than others. Please copy code from the solution and study it if necessary.
6
3
Copyright (C) 2008, http://www.dabeaz.com 0-
Using IDLE
• For this class, we will be using IDLE.
• IDLE is an integrated development environment that comes prepackaged with Python
• It's not the most advanced tool, but it works
• Follow the instructions on the next two slides to start it in the correct environment
7
Copyright (C) 2008, http://www.dabeaz.com 0-
Running IDLE (Windows)• Find RunIDLE in the PythonClass/ folder
• Double-click to start the IDLE environment
8
4
Copyright (C) 2008, http://www.dabeaz.com 0-
Running IDLE (Mac/Unix)
• Go into the PythonClass/ directory
• Type the following command in a command shell
9
% python RunIDLE.pyw
• Note: Typing 'idle' at the shell might also work.
5
Copyright (C) 2008, http://www.dabeaz.com 1-
Introduction to Python
Section 1
1
Copyright (C) 2008, http://www.dabeaz.com 1-
Where to Get Python?
2
http://www.python.org
• Downloads
• Documentation and tutorial
• Community Links
• Third party packages
• News and more
6
Copyright (C) 2008, http://www.dabeaz.com 1-
What is Python?
• An interpreted high-level programming language.
• Similar to Perl, Ruby, Tcl, and other so-called "scripting languages."
• Created by Guido van Rossum around 1990.
• Named in honor of Monty Python
3
Copyright (C) 2008, http://www.dabeaz.com 1-
Why was Python Created?
4
"My original motivation for creating Python was the perceived need for a higher level language in the Amoeba [Operating Systems] project. I realized that the development of system administration utilities in C was taking too long. Moreover, doing these things in the Bourne shell wouldn't work for a variety of reasons. ... So, there was a need for a language that would bridge the gap between C and the shell."
- Guido van Rossum
7
Copyright (C) 2008, http://www.dabeaz.com 1-
Python Influences
• C (syntax, operators, etc.)
• ABC (syntax, core data types, simplicity)
• Unix ("Do one thing well")
• Shell programming (but not the syntax)
• Lisp, Haskell, and Smalltalk (later features)
5
Copyright (C) 2008, http://www.dabeaz.com 1-
Some Uses of Python
• Text processing/data processing
• Application scripting
• Systems administration/programming
• Internet programming
• Graphical user interfaces
• Testing
• Writing quick "throw-away" code
6
8
Copyright (C) 2008, http://www.dabeaz.com 1-
Python Non-Uses
• Device drivers and low-level systems
• Computer graphics, visualization, and games
• Numerical algorithms/scientific computing
7
Comment : Python is still used in these application domains, but only as a high-level control language. Important computations are actually carried out in C, C++, Fortran, etc. For example, you would not implement matrix-multiplication in Python.
Copyright (C) 2008, http://www.dabeaz.com 1-
Getting Started
• In this section, we will cover the absolute basics of Python programming
• How to start Python
• Python's interactive mode
• Creating and running simple programs
• Basic calculations and file I/O.
8
9
Copyright (C) 2008, http://www.dabeaz.com 1-
Running Python• Python programs run inside an interpreter
• The interpreter is a simple "console-based" application that normally starts from a command shell (e.g., the Unix shell)
shell % pythonPython 2.5.1 (r251:54869, Apr 18 2007, 22:08:04) [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwinType "help", "copyright", "credits" or "license" >>>
9
• Expert programmers usually have no problem using the interpreter in this way, but it's not so user-friendly for beginners
Copyright (C) 2008, http://www.dabeaz.com 1-
IDLE
• Python includes a simple integrated development called IDLE (which is another Monty Python reference)
• It's not the most sophisticated environment but it's already installed and it works
• Most working Python programmers tend to use something else, but it is fine for this class.
10
10
Copyright (C) 2008, http://www.dabeaz.com 1-
IDLE on Windows
• Look for it in the "Start" menu
11
Copyright (C) 2008, http://www.dabeaz.com 1-
IDLE on other Systems• Launch a terminal or command shell
shell % python -m idlelib.idle
12
• Type the following command to launch IDLE
11
Copyright (C) 2008, http://www.dabeaz.com 1-
The Python Interpreter
• When you start Python, you get an "interactive" mode where you can experiment
• If you start typing statements, they will run immediately
• No edit/compile/run/debug cycle
• In fact, there is no "compiler"
13
Copyright (C) 2008, http://www.dabeaz.com 1-
Interactive Mode• The interpreter runs a "read-eval" loop
>>> print "hello world"hello world>>> 37*421554>>> for i in range(5):... print i...01234>>>
• Executes simple statements typed in directly
• Very useful for debugging, exploration
14
12
Copyright (C) 2008, http://www.dabeaz.com 1-
Interactive Mode• Some notes on using the interactive shell
>>> print "hello world"hello world>>> 37*421554>>> for i in range(5):... print i...01234>>>
15
>>> is the interpreter prompt for starting a
new statement
... is the interpreter prompt for continuing a statement (it may be blank in some tools) Enter a blank line to
finish typing and to run
Copyright (C) 2008, http://www.dabeaz.com 1-
Interactive Mode in IDLE• Interactive shell plus added help features
syntax highlights usage information
16
13
Copyright (C) 2008, http://www.dabeaz.com 1-
Getting Help• help(name) command
• Documentation at http://docs.python.org
17
>>> help(range)Help on built-in function range in module __builtin__:
range(...) range([start,] stop[, step]) -> list of integers Return a list containing an arithmetic progression of integers. range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0. ...>>>
• Type help() with no name for interactive help
Copyright (C) 2008, http://www.dabeaz.com 1-
Exercise 1.1
18
Time: 10 minutes
14
Copyright (C) 2008, http://www.dabeaz.com 1-
Creating Programs
• Programs are put in .py files
# helloworld.pyprint "hello world"
• Source files are simple text files
• Create with your favorite editor (e.g., emacs)
• Note: May be special editing modes
• Can also edit programs with IDLE or other Python IDE (too many to list)
19
Copyright (C) 2008, http://www.dabeaz.com 1-
Creating Programs• Creating a new program in IDLE
20
15
Copyright (C) 2008, http://www.dabeaz.com 1-
Creating Programs• Editing a new program in IDLE
21
Copyright (C) 2008, http://www.dabeaz.com 1-
Creating Programs• Saving a new Program in IDLE
22
16
Copyright (C) 2008, http://www.dabeaz.com 1-
Running Programs (IDLE)• Select "Run Module" (F5)
• Will see output in IDLE shell window
23
Copyright (C) 2008, http://www.dabeaz.com 1-
Running Programs• In production environments, Python may be
run from command line or a script
• Command line (Unix)shell % python helloworld.pyhello worldshell %
• Command shell (Windows)C:\SomeFolder>helloworld.pyhello world
C:\SomeFolder>c:\python25\python helloworld.pyhello world
24
17
Copyright (C) 2008, http://www.dabeaz.com 1-
A Sample Program• The Sears Tower Problem
You are given a standard sheet of paper which you fold in half. You then fold that in half and keep folding. How many folds do you have to make for the thickness of the folded paper to be taller than the Sears Tower? A sheet of paper is 0.1mm thick and the Sears Tower is 442 meters tall.
25
Copyright (C) 2008, http://www.dabeaz.com 1-
A Sample Program# sears.py# How many times do you have to fold a piece of paper# for it to be taller than the Sears Tower?
height = 442 # Metersthickness = 0.1*0.001 # Meters (0.1 millimeter)
numfolds = 0while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness
print numfolds, "folds required"print "final thickness is", thickness, "meters"
26
18
Copyright (C) 2008, http://www.dabeaz.com 1-
A Sample Program
• Output% python sears.py1 0.00022 0.00043 0.00084 0.00165 0.0032...20 104.857621 209.715222 419.430423 838.860823 folds requiredfinal thickness is 838.8608 meters
27
Copyright (C) 2008, http://www.dabeaz.com 1-
Exercise 1.2
28
Time: 10 minutes
19
Copyright (C) 2008, http://www.dabeaz.com 1-
An IDLE Caution• The first rule of using IDLE is that you will
always open files using the "File > Open" or "File > Recent Files" menu options
29
• Do not right click on .py files to open
Copyright (C) 2008, http://www.dabeaz.com 1-
An IDLE Caution• The second rule of using IDLE is that you will
always open files using the "File > Open" or "File > Recent Files" menu options
30
• Ignore this rule at your own peril (IDLE will operate strangely and you'll be frustrated)
20
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Statements
• A Python program is a sequence of statements
• Each statement is terminated by a newline
• Statements are executed one after the other until you reach the end of the file.
• When there are no more statements, the program stops
31
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Comments
• Comments are denoted by #
# This is a commentheight = 442 # Meters
32
• Extend to the end of the line
• There are no block comments in Python (e.g., /* ... */).
21
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101: Variables
• A variable is just a name for some value
• Variable names follow same rules as C [A-Za-z_][A-Za-z0-9_]*
• You do not declare types (int, float, etc.)
height = 442 # An integerheight = 442.0 # Floating pointheight = "Really tall" # A string
• Differs from C++/Java where variables have a fixed type that must be declared.
33
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101: Keywords
• Python has a basic set of language keywords
• Variables can not have one of these names
• These are mostly C-like and have the same meaning in most cases (later)
34
andassert
break
classcontinue
defdelelif
else
exceptexecfinallyfor
fromglobalif
importinislambdanotorpassprint
raisereturn
trywhile
withyield
22
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101: Case Sensitivity• Python is case sensitive
• These are all different variables:
35
name = "Jake"Name = "Elwood"NAME = "Guido"
• Language statements are always lower-caseprint "Hello World" # OKPRINT "Hello World" # ERROR
while x < 0: # OKWHILE x < 0: # ERROR
• So, no shouting please...
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101: Looping
• The while statement executes a loop
• Executes the indented statements underneath while the condition is true
36
while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness
23
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Indentation
• Indentation used to denote blocks of code
• Indentation must be consistent
while thickness <= height: thickness = thickness * 2 numfolds = numfolds + 1 print numfolds, thickness
while thickness <= height:thickness = thickness * 2 numfolds = numfolds + 1
print numfolds, thickness
(ok) (error)
• Colon (:) always indicates start of new block
while thickness <= height:
37
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Indentation
• There is a preferred indentation style
• Always use spaces
• Use 4 spaces per level
• Avoid tabs
• Always use a Python-aware editor
38
24
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Conditionals
• If-elseif a < b: print "Computer says no"else: print "Computer says yes"
• If-elif-elseif a == '+': op = PLUSelif a == '-': op = MINUSelif a == '*': op = TIMESelse: op = UNKNOWN
39
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Relations
40
• Relational operators
< > <= >= == !=
• Boolean expressions (and, or, not)
if b >= a and b <= c: print "b is between a and c"
if not (b < a or b > c): print "b is still between a and c"
• Non-zero numbers, non-empty objects also evaluate as True.
x = 42if x: # x is nonzero
25
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Printing
• The print statement
print x print x,y,zprint "Your name is", nameprint x, # Omits newline
• Produces a single line of text
• Items are separated by spaces
• Always prints a newline unless a trailing comma is added after last item
41
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : pass statement
• Sometimes you will need to specify an empty block of code
if name in namelist: # Not implemented yet (or nothing) passelse: statements
42
• pass is a "no-op" statement
• It does nothing, but serves as a placeholder for statements (possibly to be added later)
26
Copyright (C) 2008, http://www.dabeaz.com 1-
Python 101 : Long Lines
• Sometimes you get long statements that you want to break across multiple lines
• Use the line continuation character (\)
if product=="game" and type=="pirate memory" \ and age >= 4 and age <= 8: print "I'll take it!"
43
• However, not needed for code in (), [], or {}
if (product=="game" and type=="pirate memory" and age >= 4 and age <= 8): print "I'll take it!"
Copyright (C) 2008, http://www.dabeaz.com 1-
Exercise 1.3
44
Time: 15 minutes
27
Copyright (C) 2007, http://www.dabeaz.com 1-
Basic Datatypes
• Python only has a few primitive types of data
• Numbers
• Strings (character text)
45
Copyright (C) 2007, http://www.dabeaz.com 1-
Numbers
• Python has 5 types of numbers
• Booleans
• Integers
• Long integers
• Floating point
• Complex (imaginary numbers)
46
28
Copyright (C) 2007, http://www.dabeaz.com 1-
Booleans (bool)
• Two values: True, Falsea = Trueb = False
• Evaluated as integers with value 1,0
c = 4 + True # c = 5d = Falseif d == 0: print "d is False"
• Although doing that in practice would be odd
47
Copyright (C) 2007, http://www.dabeaz.com 1-
Integers (int)
• Signed integers up to machine precision
a = 37b = -299392993c = 0x7fa8 # Hexadecimald = 0253 # Octal
• Typically 32 bits
• Comparable to the C long type
48
29
Copyright (C) 2007, http://www.dabeaz.com 1-
Long Integers (long)
• Arbitrary precision integersa = 37Lb = -126477288399477266376467L
• Integers that overflow promote to longs>>> 3 ** 7367585198634817523235520443624317923L>>> a = 72883988882883812>>> a72883988882883812L>>>
• Can almost always be used interchangeably with integers
49
Copyright (C) 2007, http://www.dabeaz.com 1-
Integer Operations+ Add- Subtract* Multiply/ Divide // Floor divide% Modulo** Power<< Bit shift left>> Bit shift right& Bit-wise AND| Bit-wise OR^ Bit-wise XOR~ Bit-wise NOTabs(x) Absolute valuepow(x,y[,z]) Power with optional modulo (x**y)%zdivmod(x,y) Division with remainder
50
30
Copyright (C) 2007, http://www.dabeaz.com 1-
Integer Division• Classic division (/) - truncates
>>> 5/41>>>
• Floor division (//) - truncates (same)>>> 5//41>>>
• Future division (/) - Converts to float>>> from __future__ import division>>> 5/41.25
• Will change in some future Python version
• If truncation is intended, use //51
Copyright (C) 2007, http://www.dabeaz.com 1-
Floating point (float)
• Use a decimal or exponential notationa = 37.45b = 4e5c = -1.345e-10
• Represented as double precision using the native CPU representation (IEEE 754)
17 digits of precisionExponent from -308 to 308
• Same as the C double type
52
31
Copyright (C) 2007, http://www.dabeaz.com 1-
Floating point• Be aware that floating point numbers are
inexact when representing decimal values.
>>> a = 3.4>>> a3.3999999999999999>>>
53
• This is not Python, but the underlying floating point hardware on the CPU.
• When inspecting data at the interactive prompt, you will always see the exact representation (print output may differ)
Copyright (C) 2007, http://www.dabeaz.com 1-
Floating Point Operators+ Add- Subtract* Multiply/ Divide % Modulo (remainder)** Powerpow(x,y [,z]) Power modulo (x**y)%zabs(x) Absolute valuedivmod(x,y) Division with remainder
• Additional functions are in the math module
import matha = math.sqrt(x)b = math.sin(x)c = math.cos(x)d = math.tan(x)e = math.log(x)
54
32
Copyright (C) 2007, http://www.dabeaz.com 1-
Converting Numbers• Type name can be used to convert
a = int(x) # Convert x to integerb = long(x) # Convert x to longc = float(x) # Convert x to float
• Example:>>> a = 3.14159>>> int(a)3>>>
• Also work with strings containing numbers
>>> a = "3.14159">>> float(a)3.1415899999999999>>> int("0xff",16) # Optional integer base255
55
Copyright (C) 2008, http://www.dabeaz.com 1-
Strings• Written in programs with quotes
a = "Yeah but no but yeah but..."
b = 'computer says no'
c = '''Look into my eyes, look into my eyes,the eyes, the eyes, the eyes,not around the eyes, don't look around the eyes,look into my eyes, you're under.'''
• Standard escape characters work (e.g., '\n')
• Triple quotes capture all literal text enclosed
56
33
Copyright (C) 2007, http://www.dabeaz.com 1-
String Escape Codes
'\n' Line feed'\r' Carriage return'\t' Tab'\xhh' Hexadecimal value'\”' Literal quote'\\' Backslash
• In quotes, standard escape codes work
• Raw strings (don’t interpret escape codes)a = r"c:\newdata\test" # String exactly as specified
57
Leading r
Copyright (C) 2007, http://www.dabeaz.com 1-
String Representation
• An ordered sequence of bytes (characters)
58
• Stores 8-bit data (ASCII)
• May contain binary data, control characters, etc.
• Strings are frequently used for both text and for raw-data of any kind
34
Copyright (C) 2007, http://www.dabeaz.com 1-
String Representation
• Strings work like an array : s[n]a = "Hello world"b = a[0] # b = 'H'c = a[4] # c = 'o'd = a[-1] # d = 'd' (Taken from end of string)
• Slicing/substrings : s[start:end]d = a[:5] # d = "Hello"e = a[6:] # e = "world"f = a[3:8] # f = "lo wo"g = a[-5:] # g = "world"
• Concatenation (+)a = "Hello" + "World"b = "Say " + a
59
Copyright (C) 2007, http://www.dabeaz.com 1-
More String Operations• Length (len)
>>> s = "Hello">>> len(s)5>>>
• Membership test (in)>>> 'e' in sTrue>>> 'x' in sFalse>>> "ello" in sTrue
60
• Replication (s*n)>>> s = "Hello">>> s*5'HelloHelloHelloHelloHello'>>>
35
Copyright (C) 2007, http://www.dabeaz.com 1-
String Methods
• Stripping any leading/trailing whitespace
t = s.strip()
• Case conversion
t = s.lower()t = s.upper()
• Replacing text
t = s.replace("Hello","Hallo")
61
• Strings have "methods" that perform various operations with the string data.
Copyright (C) 2007, http://www.dabeaz.com 1-
More String Methodss.endswith(suffix) # Check if string ends with suffixs.find(t) # First occurrence of t in ss.index(t) # First occurrence of t in ss.isalpha() # Check if characters are alphabetics.isdigit() # Check if characters are numerics.islower() # Check if characters are lower-cases.isupper() # Check if characters are upper-cases.join(slist) # Joins lists using s as delimeter s.lower() # Convert to lower cases.replace(old,new) # Replace texts.rfind(t) # Search for t from end of strings.rindex(t) # Search for t from end of strings.split([delim]) # Split string into list of substringss.startswith(prefix) # Check if string starts with prefixs.strip() # Strip leading/trailing spaces.upper() # Convert to upper case
62
36
Copyright (C) 2007, http://www.dabeaz.com 1-
String Mutability
• Strings are "immutable" (read only)
• Once created, the value can't be changed
>>> s = "Hello World">>> s[1] = 'a'Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: 'str' object does not support item assignment>>>
63
• All operations and methods that manipulate string data always create new strings
Copyright (C) 2007, http://www.dabeaz.com 1-
String Conversions
• To convert any object to string
• Produces the same text as print
s = str(obj)
• Actually, print uses str() for output
>>> x = 42>>> str(x)'42'>>>
64
37
Copyright (C) 2007, http://www.dabeaz.com 1-
Exercise 1.4
65
Time: 10 minutes
Copyright (C) 2007, http://www.dabeaz.com 1-
String Splitting
• Strings often represent fields of data
• To work with each field, split into a list
>>> line = 'GOOG 100 490.10'>>> fields = line.split() >>> fields['GOOG', '100', '490.10']>>>
66
• Example: When reading data from a file, you might read each line and then split the line into columns.
38
Copyright (C) 2007, http://www.dabeaz.com 1-
Lists• A array of arbitrary values
names = [ "Elwood", "Jake", "Curtis" ]nums = [ 39, 38, 42, 65, 111]
• Can contain mixed data types items = [ "Elwood", 39, 1.5 ]
• Adding new items (append, insert)
items.append("that") # Adds at enditems.insert(2,"this") # Inserts in middle
67
• Concatenation : s + ts = [1,2,3]t = ['a','b']
s + t [1,2,3,'a','b']
Copyright (C) 2007, http://www.dabeaz.com 1-
Lists (cont)
• Negative indices are from the end
names[-1] "Curtis"
68
• Lists are indexed by integers (starting at 0)names = [ "Elwood", "Jake", "Curtis" ]
names[0] "Elwood"names[1] "Jake"names[2] "Curtis"
• Changing one of the items
names[1] = "Joliet Jake"
39
Copyright (C) 2007, http://www.dabeaz.com 1-
More List Operations• Length (len)
>>> s = ['Elwood','Jake','Curtis']>>> len(s)3>>>
• Membership test (in)>>> 'Elwood' in sTrue>>> 'Britney' in sFalse>>>
69
• Replication (s*n)>>> s = [1,2,3]>>> s*3[1,2,3,1,2,3,1,2,3]>>>
Copyright (C) 2007, http://www.dabeaz.com 1-
List Removal
• Removing an item
names.remove("Curtis")
del names[2]
• Deleting an item by index
70
• Removal results in items moving down to fill the space vacated (i.e., no "holes").
40
Copyright (C) 2007, http://www.dabeaz.com 1-
Exercise 1.5
71
Time: 10 minutes
Copyright (C) 2007, http://www.dabeaz.com 1-
File Input and Output
• Opening a filef = open("foo.txt","r") # Open for readingg = open("bar.txt","w") # Open for writing
• To read dataline = f.readline() # Read a line of textdata = f.read() # Read all data
• To write text to a file
g.write("some text")
• To print to a fileprint >>g, "Your name is", name
72
41
Copyright (C) 2007, http://www.dabeaz.com 1-
Looping over a file• Reading a file line by line
f = open("foo.txt","r") for line in f: # Process the line ...f.close()
• Alternativelyfor line in open("foo.txt","r"): # Process the line ...
• This reads all lines until you reach the end of the file
73
Copyright (C) 2008, http://www.dabeaz.com 1-
Exercise 1.6
74
Time: 10 minutes
42
Copyright (C) 2007, http://www.dabeaz.com 1-
Type Conversion
• In Python, you must be careful about converting data to an appropriate type
x = '37' # Stringsy = '42'z = x + y # z = '3742' (concatenation)
x = 37y = 42z = x + y # z = 79 (integer +)
75
• This differs from Perl where "+" is assumed to be numeric arithmetic (even on strings)$x = '37';$y = '42';$z = $x + $y; # $z = 79
Copyright (C) 2007, http://www.dabeaz.com 1-
Simple Functions
• Use functions for code you want to reuse
def sumcount(n): total = 0 while n > 0: total += n n -= 1 return total
• Calling a function
a = sumcount(100)
76
• A function is just a series of statements that perform some task and return a result
43
Copyright (C) 2007, http://www.dabeaz.com 1-
Library Functions
• Python comes with a large standard library
• Library modules accessed using import
import mathx = math.sqrt(10)
import urllibu = urllib.urlopen("http://www.python.org/index.html")data = u.read()
77
• Will cover in more detail later
Copyright (C) 2007, http://www.dabeaz.com 1-
Exception Handling
• Errors are reported as exceptions
• An exception causes the program to stop
>>> f = open("file.dat","r")Traceback (most recent call last): File "<stdin>", line 1, in <module>IOError: [Errno 2] No such file or directory: 'file.dat'>>>
78
• For debugging, message describes what happened, where the error occurred, along with a traceback.
44
Copyright (C) 2008, http://www.dabeaz.com 1-
Exceptions
• To catch, use try-except statementtry: f = open(filename,"r")except IOError: print "Could not open", filename
79
• Exceptions can be caught and handled
>>> f = open("file.dat","r")Traceback (most recent call last): File "<stdin>", line 1, in <module>IOError: [Errno 2] No such file or directory: 'file.dat'>>>
Name must match the kind of error you're trying to catch
Copyright (C) 2008, http://www.dabeaz.com 1-
Exceptions
• To raise an exception, use the raise statement
raise RuntimeError("What a kerfuffle")
80
% python foo.pyTraceback (most recent call last): File "foo.py", line 21, in <module> raise RuntimeError("What a kerfuffle")RuntimeError: What a kerfuffle
• Will cause the program to abort with an exception traceback (unless caught by try-except)
45
Copyright (C) 2008, http://www.dabeaz.com 1-
dir() function
• dir(obj) returns all names defined in obj
>>> import sys>>> dir(sys)['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdout__', '_current_frames', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'getcheckinterval', ...'version_info', 'warnoptions']
• Useful for exploring, inspecting objects, etc.
81
Copyright (C) 2008, http://www.dabeaz.com 1-
dir() Example• dir() will also tell you what methods/
attributes are available on an object.>>> a = "Hello World">>> dir(a)['__add__','__class__', ..., 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', ... 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']>>>
82
• Note: you may see a lot of names with leading/trailing underscores. These have special meaning to Python--ignore for now.
46
Copyright (C) 2008, http://www.dabeaz.com 1-
Summary
• This has been an overview of simple Python
• Enough to write basic programs
• Just have to know the core datatypes and a few basics (loops, conditions, etc.)
83
Copyright (C) 2008, http://www.dabeaz.com 1-
Exercise 1.7
84
Time: 15 minutes
47
Working with Data
Section 2
Copyright (C) 2008, http://www.dabeaz.com 2-
Overview
• Most programs work with data
• In this section, we look at how Python programmers represent and work with data
• Common programming idioms
• How to (not) shoot yourself in the foot
2
48
Copyright (C) 2008, http://www.dabeaz.com 2-
Primitive Datatypes
• Python has a few primitive types of data
• Integers
• Floating point numbers
• Strings (text)
• Obviously, all programs use these
3
Copyright (C) 2008, http://www.dabeaz.com 2-
None type
• Nothing, nil, null, nada
logfile = None
• This is often used as a placeholder for optional program settings or values
4
if logfile: logfile.write("Some message")
• If you don't assign logfile to something, the above code would crash (undefined variable)
49
Copyright (C) 2008, http://www.dabeaz.com 2-
Data Structures
• Real programs have more complex data
• Example: A holding of stock
100 shares of GOOG at $490.10
• An "object" with three parts
• Name ("GOOG", a string)
• Number of shares (100, an integer)
• Price (490.10, a float)
5
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuples• A collection of values grouped together
• Example:
s = ('GOOG',100,490.10)
• Sometimes the () are omitted in syntax
s = 'GOOG', 100, 490.10
• Special cases (0-tuple, 1-tuple)
t = () # An empty tuple w = ('GOOG',) # A 1-item tuple
6
50
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuple Use
• Tuples are usually used to represent simple records and data structures
contact = ('David Beazley','[email protected]')stock = ('GOOG', 100, 490.10)host = ('www.python.org', 80)
• Basically, a simple "object" of multiple parts
7
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuples (cont)
• Tuple contents are ordered (like an array)
s = ('GOOG',100, 490.10)name = s[0] # 'GOOG'shares = s[1] # 100price = s[2] # 490.10
• However, the contents can't be modified>>> s[1] = 75 TypeError: object does not support item assignment
• Although you can always create a new tuplet = (s[0], 75, s[2])
8
51
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuple Packing
• Tuples are really focused on packing and unpacking data into variables , not storing distinct items in a list
• Packing multiple values into a tuple
s = ('GOOG', 100, 490.10)
• The tuple is then easy to pass around to other parts of a program as a single object
9
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuple Unpacking
• To use the tuple elsewhere, you typically unpack its parts into variables
• Unpacking values from a tuple
(name, shares, price) = s
print "Cost", shares*price
• Note: The () syntax is sometimes omitted
name, shares, price = s
10
52
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuple Commentary
• Tuples are a fundamental part of Python
• Used for critical parts of the interpreter
• Highly optimized
• Key point : Compound data, but not used to represent a list of distinct "objects"
11
Copyright (C) 2008, http://www.dabeaz.com 2-
Tuples vs. Lists
• Can't you just use a list instead of a tuple?
12
s = ['GOOG', 100, 490.10]
• Well, yes, but it's not quite as efficient
• The implementation of lists is optimized for growth using the append() method.
• There is often a little extra space at the end
• So using a list requires a bit more memory
53
Copyright (C) 2008, http://www.dabeaz.com 2-
Dictionaries
• A hash table or associative array
• A collection of values indexed by "keys"
• The keys are like field names
• Example:
s = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10}
13
Copyright (C) 2008, http://www.dabeaz.com 2-
Dictionaries• Getting values: Just use the key names
>>> print s['name'],s['shares']GOOG 100>>> s['price']490.10>>>
• Adding/modifying values : Assign to key names>>> s['shares'] = 75>>> s['date'] = '6/6/2007'>>>
• Deleting a value>>> del s['date']>>>
14
54
Copyright (C) 2008, http://www.dabeaz.com 2-
Dictionaries
• When to use a dictionary as a data structure
• Data has many different parts
• The parts will be modified/manipulated
• Example: If you were reading data from a database and each row had 50 fields, a dictionary could store the contents of each row using descriptive field names.
15
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.1
Time : 10 minutes
16
55
Copyright (C) 2008, http://www.dabeaz.com 2-
Containers
• Programs often have to work many objects
• A portfolio of stocks
• Spreadsheets and matrices
• Three choices:
• Lists (ordered data)
• Dictionaries (unordered data)
• Sets (unordered collection)
17
Copyright (C) 2008, http://www.dabeaz.com 2-
Lists as a Container
• Use a list when the order of data matters
• Lists can hold any kind of object
• Example: A list of tuples
portfolio = [ ('GOOG',100,490.10), ('IBM',50,91.10), ('CAT',150,83.44)]
portfolio[0] ('GOOG',100,490.10)portfolio[1] ('IBM',50,91.10)
18
56
Copyright (C) 2008, http://www.dabeaz.com 2-
Dicts as a Container• Dictionaries are useful if you want fast
random lookups (by key name)
• Example: A dictionary of stock prices
prices = { 'GOOG' : 513.25, 'CAT' : 87.22, 'IBM' : 93.37, 'MSFT' : 44.12 ...}
>>> prices['IBM']93.37>>> prices['GOOG']513.25>>>
19
Copyright (C) 2008, http://www.dabeaz.com 2-
Dict : Looking For Items
• To test for existence of a key
if key in d: # Yeselse: # No
20
• Looking up a value that might not exist
name = d.get(key,default)
• Example:
>>> prices.get('IBM',0.0)93.37>>> prices.get('SCOX',0.0)0.0>>>
57
Copyright (C) 2008, http://www.dabeaz.com 2-
Dicts and Lists• You often get input data as a list of pairs
21
"GOOG",513.25"CAT",87.22"IBM",93.37"MSFT",44.12...
[ ('GOOG',513.25), ('CAT',87.22), ('IBM',93.37), ('MSFT',44.12),...]
file tuples
• dict() promotes a list of pairs to a dictionary
pricelist = [('GOOG',513.25),('CAT',87.22), ('IBM',93.37),('MSFT',44.12)]
prices = dict(pricelist)print prices['IBM']print prices['MSFT']
Copyright (C) 2008, http://www.dabeaz.com 2-
Sets• Sets
a = set([2,3,4])
• Holds collection of unordered items
• No duplicates, support common set ops
>>> a = set([2,3,4])>>> b = set([4,8,9])>>> a | b # Unionset([2,3,4,8,9])>>> a & b # Intersectionset([4])>>> a - b # Differenceset([2,3])>>>
22
58
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.2
Time : 20 minutes
23
Copyright (C) 2008, http://www.dabeaz.com 2-
Formatted Output
• When working with data, you often want to produce structured output (tables, etc.).
Name Shares Price ---------- ---------- ---------- AA 100 32.20 IBM 50 91.10 CAT 150 83.44 MSFT 200 51.23 GE 95 40.37 MSFT 50 65.10 IBM 100 70.44
24
59
Copyright (C) 2008, http://www.dabeaz.com 2-
String Formatting
• Formatting operator (%)>>> "The value is %d" % 3'The value is 3'>>> "%5d %-5d %10d" % (3,4,5)' 3 4 5'>>> "%0.2f" % (3.1415926,)‘3.14’
• Requires single item or a tuple on right
• Commonly used with printprint "%d %0.2f %s" % (index,val,label)
• Format codes are same as with C printf()
25
Copyright (C) 2008, http://www.dabeaz.com 2-
Format Codes
%d Decimal integer%u Unsigned integer%x Hexadecimal integer%f Float as [-]m.dddddd%e Float as [-]m.dddddde+-xx%g Float, but selective use of E notation%s String%c Character%% Literal %
%10d Decimal in a 10-character field (right align)%-10d Decimal in a 10-character field (left align)%0.2f Float with 2 digit precision%40s String in a 40-character field (right align)%-40s String in a 40-character field (left align)
26
60
Copyright (C) 2008, http://www.dabeaz.com 2-
String Formatting• Formatting with fields in a dictionary
>>> stock = {... 'name' : 'GOOG',... 'price' : 490.10,... 'shares' : 100}>>> print "%(name)8s %(shares)10d %(price)10.2f" % stock GOOG 100 490.10>>>
• Useful if performing replacements with a large number of named values
• This is Python's version of "string interpolation" ($variables in other langs)
27
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.3
Time : 20 minutes
28
61
Copyright (C) 2008, http://www.dabeaz.com 2-
Working with Sequences
• Python has three "sequence" datatypesa = "Hello" # Stringb = [1,4,5] # List c = ('GOOG',100,490.10) # Tuple
• Sequences are ordered : s[n]
a[0] 'H'b[-1] 5c[1] 100
• Sequences have a length : len(s)len(a) 5len(b) 3len(c) 3
29
Copyright (C) 2008, http://www.dabeaz.com 2-
Working with Sequences
• Sequences can be replicated : s * n>>> a = 'Hello' >>> a * 3'HelloHelloHello'>>> b = [1,2,3]>>> b * 2[1, 2, 3, 1, 2, 3]>>>
• Similar sequences can be concatenated : s + t>>> a = (1,2,3)>>> b = (4,5)>>> a + b(1,2,3,4,5)>>>
30
62
Copyright (C) 2008, http://www.dabeaz.com 2-
Sequence Slicing• Slicing operator : s[start:end]
a = [0,1,2,3,4,5,6,7,8]
a[2:5] [2,3,4] a[-5:] [4,5,6,7,8]
a[:3] [0,1,2]
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
• Indices must be integers
• Slices do not include end value
• If indices are omitted, they default to the beginning or end of the list.
31
Copyright (C) 2008, http://www.dabeaz.com 2-
Extended Slices• Extended slicing: s[start:end:step]
a = [0,1,2,3,4,5,6,7,8]
a[:5] [0,1,2,3,4]
a[-5:] [4,5,6,7,8]
a[0:5:2] [0,2,4]
a[::-2] [8,6,4,2,0]
a[6:2:-1] [6,5,4,3]
• step indicates stride and direction
• end index is not included in result
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
32
63
Copyright (C) 2008, http://www.dabeaz.com 2-
Sequence Reductions
• sum(s)
>>> s = [1, 2, 3, 4]>>> sum(s)10>>>
• min(s), max(s)
>>> min(s)1>>> max(s)4>>> max(t)'World'>>>
33
Copyright (C) 2008, http://www.dabeaz.com 2-
Iterating over a Sequence
• The for-loop iterates over sequence data
• On each iteration of the loop, you get new item of data to work with.
>>> s = [1, 4, 9, 16]>>> for i in s:... print i...14916>>>
34
64
Copyright (C) 2008, http://www.dabeaz.com 2-
Iteration Variables• Each time through the loop, a new value is
placed into an iteration variable
for x in s: statements:
• Overwrites the previous value (if any)
iteration variable
x = 42
for x in s: # Overwrites any previous x statements
print x # Prints value from last iteration
• After the loop finishes, the variable has the value from the last iteration of the loop
35
Copyright (C) 2008, http://www.dabeaz.com 2-
break and continue
• Jumping to the start of the next iteration
• Breaking out of a loop (exiting)
for name in namelist: if name == username: break
for line in lines: if not line: continue # More statements ...
• These statements only apply to the inner-most loop that is active
36
65
Copyright (C) 2008, http://www.dabeaz.com 2-
Looping over integers
• xrange([start,] end [,step])
for i in xrange(100): # i = 0,1,...,99
for j in xrange(10,20): # j = 10,11,..., 19
for k in xrange(10,50,2): # k = 10,12,...,48
• Note: The ending value is never included (this mirrors the behavior of slices)
• If you simply need to count, use xrange()
37
Copyright (C) 2008, http://www.dabeaz.com 2-
Caution with range()• range([start,] end [,step])
x = range(100) # x = [0,1,...,99]y = range(10,20) # y = [10,11,...,19]z = range(10,50,2) # z = [10,12,...,48]
• range() creates a list of integers
• You will sometimes see this code
for i in range(N): statements
• If you are only looping, use xrange() instead. It computes its values on demand instead of creating a list.
38
66
Copyright (C) 2008, http://www.dabeaz.com 2-
enumerate() Function
• Provides a loop counter value
names = ["Elwood","Jake","Curtis"]for i,name in enumerate(names): # Loops with i = 0, name = 'Elwood' # i = 1, name = 'Jake' # i = 2, name = 'Curtis' ...
• Example: Keeping a line numberfor linenumber,line in enumerate(open(filename)): ...
39
Copyright (C) 2008, http://www.dabeaz.com 2-
enumerate() Function
• enumerate() is a nice shortcutfor i,x in enumerate(s): statements
• Compare to:i = 0for x in s: statements i += 1
• Less typing and enumerate() runs slightly faster
40
67
Copyright (C) 2008, http://www.dabeaz.com 2-
for and tuples
• Looping with multiple iteration variablespoints = [ (1,4),(10,40),(23,14),(5,6),(7,8)]
for x,y in points: # Loops with x = 1, y = 4 # x = 10, y = 40 # x = 23, y = 14 # ...
• Here, each tuple is unpacked into a set of iteration variables.
tuples are expanded
41
Copyright (C) 2008, http://www.dabeaz.com 2-
zip() Function
• Combines multiple sequences into tuples
a = [1,4,9]b = ['Jake','Elwood','Curtis']
x = zip(a,b) # x = [(1,'Jake'),(4,'Elwood'), ...]
• One use: Looping over multiple listsa = [1,4,9]b = ['Jake','Elwood','Curtis']
for num,name in zip(a,b): # Statements ...
42
68
Copyright (C) 2008, http://www.dabeaz.com 2-
zip() Function
• zip() always stops with shortest sequence
a = [1,2,3,4,5,6]b = ['Jake','Elwood']
x = zip(a,b) # x = [(1,'Jake'),(2,'Elwood')]
• You may combine as many sequences as needed
a = [1, 2, 3, 4, 5, 6]b = ['a', 'b', 'c']c = [10, 20, 30]
x = zip(a,b,c) # x = [(1,'a',10),(2,'b',20),...]y = zip(a,zip(b,c)) # y = [(1,('a',10)),(2,('b',20)),...]
43
Copyright (C) 2008, http://www.dabeaz.com 2-
Using zip()
• zip() is also a nice programming shortcutfor a,b in zip(s,t): statements
• Compare to:i = 0while i < len(s) and i < len(t): a = s[i] b = t[i] statements
• Caveat: zip() constructs a list of tuples. Be careful when working with large data
44
69
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.4
Time : 10 minutes
45
Copyright (C) 2008, http://www.dabeaz.com 2-
List Sorting
• Lists can be sorted "in-place" (sort method)
s = [10,1,7,3]s.sort() # s = [1,3,7,10]
• Sorting in reverse order
s = [10,1,7,3]s.sort(reverse=True) # s = [10,7,3,1]
• Sorting works with any ordered types = ["foo","bar","spam"]s.sort() # s = ["bar","foo","spam"]
46
70
Copyright (C) 2008, http://www.dabeaz.com 2-
List Sorting
• Sometimes you need to perform extra processing while sorting
• Example: Case-insensitive string sort
>>> s = ["hello","WORLD","test"]>>> s.sort()>>> s['WORLD','hello','test']>>>
• Here, we might like to fix the order
47
Copyright (C) 2008, http://www.dabeaz.com 2-
List Sorting• Sorting with a key function:
>>> def tolower(x):... return x.lower()...>>> s = ["hello","WORLD","test"]>>> s.sort(key=tolower)>>> s['hello','test','WORLD']>>>
• The key function is a "callback function" that the sort() method applies to each item
• The value returned by the key function determines the sort order
48
71
Copyright (C) 2008, http://www.dabeaz.com 2-
Sequence Sorting
• sorted() function
• Turns any sequence into a sorted list
>>> sorted("Python")['P', 'h', 'n', 'o', 't', 'y']>>> sorted((5,1,9,2))[1, 2, 5, 9]
49
• This is not an in-place sort--a new list is returned as a result.
• The same key and reverse options can be supplied if necessary
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.5
Time : 10 Minutes
50
72
Copyright (C) 2008, http://www.dabeaz.com 2-
List Processing• Working with lists is very common
• Python is very adept at processing lists
• Have already seen many examples:
>>> a = [1, 2, 3, 4, 5]>>> sum(a)15>>> a[0:3][1, 2, 3]>>> a * 2[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]>>> min(a)1>>>
51
Copyright (C) 2008, http://www.dabeaz.com 2-
List Comprehensions
• Creates a new list by applying an operation to each element of a sequence.
>>> a = [1,2,3,4,5]>>> b = [2*x for x in a]>>> b[2,4,6,8,10]>>>
• Another example:>>> names = ['Elwood','Jake']>>> a = [name.lower() for name in names]>>> a['elwood','jake']>>>
52
73
Copyright (C) 2008, http://www.dabeaz.com 2-
List Comprehensions
• A list comprehension can also filter
>>> f = open("stockreport","r")>>> goog = [line for line in f if 'GOOG' in line]>>>
>>> a = [1, -5, 4, 2, -2, 10]>>> b = [2*x for x in a if x > 0]>>> b[2,8,4,20]>>>
• Another example
53
Copyright (C) 2008, http://www.dabeaz.com 2-
List Comprehensions
• General syntax
[expression for x in s if condition]
• What it meansresult = []for x in s: if condition: result.append(expression)
• Can be used anywhere a sequence is expected>>> a = [1,2,3,4]>>> sum([x*x for x in a])30>>>
54
74
Copyright (C) 2008, http://www.dabeaz.com 2-
List Comp: Examples
• List comprehensions are hugely useful
• Collecting the values of a specific field
stocknames = [s['name'] for s in stocks]
• Performing database-like queries
a = [s for s in stocks if s['price'] > 100 and s['shares'] > 50 ]
• Quick mathematics over sequences
cost = sum([s['shares']*s['price'] for s in stocks])
55
Copyright (C) 2008, http://www.dabeaz.com 2-
Historical Digression
• List comprehensions come from Haskell
a = [x*x for x in s if x > 0] # Python
a = [x*x | x <- s, x > 0] # Haskell
• And this is motivated by sets (from math)
a = { x2 | x ! s, x > 0 }
• But most Python programmers would probably just view this as a "cool shortcut"
56
75
Copyright (C) 2008, http://www.dabeaz.com 2-
List Comp. and Awk
• For Unix hackers, there is a certain similarity between list comprehensions and short one-line awk commands
# A Python List Comprehensiontotalcost = sum([shares*price for name, shares, price in portfolio])
# A Unix awk commandtotalcost = `awk '{ total += $2*$3 } END { print total }' portfolio.dat`
57
• Applying an operation to every line of a file
Copyright (C) 2008, http://www.dabeaz.com 2-
Big Idea: Being Declarative
• List comprehensions encourage a more "declarative" style of programming when processing sequences of data.
• Data can be manipulated by simply "declaring" a series of statements that perform various operations on it.
58
76
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.6
Time : 15 Minutes
59
Copyright (C) 2008, http://www.dabeaz.com 2-
More details on objects
• So far: a tour of the most common types
• Have skipped some critical details
• Memory management
• Copying
• Type checking
60
77
Copyright (C) 2008, http://www.dabeaz.com 2-
Variable Assignment
• Variables in Python are names for values
• A variable name does not represent a fixed memory location into which values are stored (like C, C++, Fortran, etc.)
• Assignment is just a naming operation
61
Copyright (C) 2008, http://www.dabeaz.com 2-
Variables and Values• At any time, a variable can be redefined to
refer to a new value
a = 42
...
a = "Hello"
42"a"
• Variables are not restricted to one data type
• Assignment doesn't overwrite the previous value (e.g., copy over it in memory)
• It just makes the name point elsewhere62
"Hello""a"
78
Copyright (C) 2008, http://www.dabeaz.com 2-
Names, Values, Types• Names do not have a "type"--it's just a name
• However, values do have an underlying type
>>> a = 42>>> b = "Hello World">>> type(a)<type 'int'>>>> type(b)<type 'str'>
• type() function will tell you what it is
• The type name is usually a function that creates or converts a value to that type
>>> str(42)'42'
63
Copyright (C) 2008, http://www.dabeaz.com 2-
Reference Counting• Variable assignment never copies anything!
• Instead, it just updates a reference count
a = 42b = ac = [1,2]c.append(b)
42"a"
"b"
"c"
ref = 3
[x, x, x]
• So, different variables might be referring to the same object (check with the is operator)>>> a is bTrue>>> a is c[2]True
64
79
Copyright (C) 2008, http://www.dabeaz.com 2-
Reference Counting• Reassignment never overwrites memory, so you
normally don't notice any of the sharing
a = 42b = a
42
"a" ref = 2
• When you reassign a variable, the name is just made to point to the new value.
a = 37 42
"a"
ref = 1
37
ref = 1
65
"b"
"b"
Copyright (C) 2008, http://www.dabeaz.com 2-
The Hidden Danger• "Copying" mutable objects such as lists and dicts
>>> a = [1,2,3,4]>>> b = a>>> b[2] = -10>>> a[1,2,-10,4]
[1,2,-10,4]"a"
"b"
• Changes affect both variables!
• Reason: Different variable names are referring to exactly the same object
• Yikes!
66
80
Copyright (C) 2008, http://www.dabeaz.com 2-
Making a Copy• You have to take special steps to copy data
>>> a = [2,3,[100,101],4]>>> b = list(a) # Make a copy>>> a is bFalse
• It's a new list, but the list items are shared
>>> a[2].append(102)>>> b[2][100,101,102]>>> 100 101 1022 3 4
a
bThis inner list isstill being shared
67
• Known as a "shallow copy"
Copyright (C) 2008, http://www.dabeaz.com 2-
Deep Copying
• Use the copy module
>>> a = [2,3,[100,101],4]>>> import copy>>> b = copy.deepcopy(a)>>> a[2].append(102)>>> b[2][100,101]>>>
• Sometimes you need to makes a copy of an object and all objects contained within it
68
81
Copyright (C) 2008, http://www.dabeaz.com 2-
Everything is an object
• Numbers, strings, lists, functions, exceptions, classes, instances, etc...
• All objects are said to be "first-class"
• Meaning: All objects that can be named can be passed around as data, placed in containers, etc., without any restrictions.
• There are no "special" kinds of objects
69
Copyright (C) 2008, http://www.dabeaz.com 2-
First Class Objects• A simple example:
>>> import math>>> items = [abs, math, ValueError ]>>> items[<built-in function abs>, <module 'math' (builtin)>, <type 'exceptions.ValueError'>]>>> items[0](-45)45>>> items[1].sqrt(2)1.4142135623730951>>> try: x = int("not a number") except items[2]: print "Failed!"
Failed!>>>
70
A list containing a function, a module, and an exception.
You can use items in the list in place of the
original names
82
Copyright (C) 2008, http://www.dabeaz.com 2-
Type Checking
• How to tell if an object is a specific type
if type(a) is list: print "a is a list"
if isinstance(a,list): # Preferred print "a is a list"
• Checking for one of many types
if isinstance(a,(list,tuple)): print "a is a list or tuple"
71
Copyright (C) 2008, http://www.dabeaz.com 2-
Summary
• Have looked at basic principles of working with data in Python programs
• Brief look at part of the object-model
• A big part of understanding most Python programs.
72
83
Copyright (C) 2008, http://www.dabeaz.com 2-
Exercise 2.7
Time : 15 Minutes
73
84
Program Organization and Functions
Section 3
Copyright (C) 2008, http://www.dabeaz.com 3-
Overview
• How to organize larger programs
• More details on program execution
• Defining and working with functions
• Exceptions and Error Handling
2
85
Copyright (C) 2008, http://www.dabeaz.com 3-
Observation
• A large number of Python programmers spend most of their time writing short "scripts"
• One-off problems, prototyping, testing, etc.
• Python is good at this!
• And it what draws many users to Python
3
Copyright (C) 2008, http://www.dabeaz.com 3-
What is a "Script?"
• A "script" is a program that simply runs a series of statements and stops
# program.py
statement1
statement2
statement3
...
4
• We've been writing scripts to this point
86
Copyright (C) 2008, http://www.dabeaz.com 3-
Problem
• If you write a useful script, it will grow features
• You may apply it to other related problems
• Over time, it might become a critical application
• And it might turn into a huge tangled mess
• So, let's get organized...
5
Copyright (C) 2008, http://www.dabeaz.com 3-
Program Structure
• Recall: programs are a sequence of statements
height = 442 # Meters
thickness = 0.1*(0.001) # Meters (0.1 millimeter)
numfolds = 0
while thickness <= height:
thickness = thickness * 2
numfolds = numfolds + 1
print numfolds, thickness
print numfolds, "folds required"
print "final thickness is", thickness, "meters"
6
• Programs run by executing the statements in the same order that they are listed.
87
Copyright (C) 2008, http://www.dabeaz.com 3-
Defining Things• You must always define things before they
get used later on in a program.
a = 42
b = a + 2 # Requires that a is already defined
def square(x):
return x*x
z = square(b) # Requires square to be defined
7
• The order is important
• You almost always put the definitions of variables and functions near the beginning
Copyright (C) 2008, http://www.dabeaz.com 3-
Defining Functions
• It is a good idea to put all of the code related to a single "task" all in one placedef read_prices(filename):
pricesDict = {}
for line in open(filename):
fields = line.split(',')
name = fields[0].strip('"')
pricesDict[name] = float(fields[1])
return pricesDict
8
• A function also simplifies repeated operations
oldprices = read_prices("oldprices.csv")
newprices = read_prices("newprices.csv")
88
Copyright (C) 2008, http://www.dabeaz.com 3-
What is a function?
• A function is a sequence of statementsdef funcname(args):
statement
statement
...
statement
9
• Any Python statement can be used inside
def foo():
import math
print math.sqrt(2)
help(math)
• There are no "special" statements in Python
Copyright (C) 2008, http://www.dabeaz.com 3-
Function Definitions
• Functions can be defined in any order
def foo(x):
bar(x)
def bar(x):
statements
10
• Functions must only be defined before they are actually used during program execution
foo(3) # foo must be defined already
def bar(x)
statements
def foo(x):
bar(x)
• Stylistically, it is more common to see functions defined in a "bottom-up" fashion
89
Copyright (C) 2008, http://www.dabeaz.com 3-
Bottom-up Style• Functions are treated as building blocks
• The smaller/simpler blocks go first
# myprogram.py
def foo(x):
...
def bar(x):
...
foo(x)
...
def spam(x):
...
bar(x)
...
spam(42) # Call spam() to do something
11
Later functions build upon earlier functions
Code that uses the functions appears at the end
Copyright (C) 2008, http://www.dabeaz.com 3-
A Definition Caution• Functions can be redefined!
def foo(x):
return 2*x
print foo(2) # Prints 4
def foo(x,y): # Redefine foo(). This replaces
return x*y # foo() above.
print foo(2,3) # Prints 6
print foo(2) # Error : foo takes two arguments
12
• A repeated function definition silently replaces the previous definition
• No overloaded functions (vs. C++, Java).
90
Copyright (C) 2008, http://www.dabeaz.com 3-
Exercise 3.1
13
Time : 15 minutes
Copyright (C) 2008, http://www.dabeaz.com 3-
Function Design
• Functions should be easy to use
• The whole point of a function is to define code for repeated operations
• Some things to think about:
• Function inputs and output
• Side-effects
14
91
Copyright (C) 2008, http://www.dabeaz.com 3-
Function Arguments
• Functions operate on passed arguments
def square(x):
return x*x
15
• Argument variables receive their values when the function is calleda = square(3)
argument
• The argument names are only visible inside the function body (are local to function)
Copyright (C) 2008, http://www.dabeaz.com 3-
Default Arguments
• Sometimes you want an optional argument
def read_prices(filename,delimiter=','):
...
d = read_prices("prices.csv")
e = read_prices("prices.dat",' ')
• If a default value is assigned, the argument is optional in function calls
16
• Note : Arguments with defaults must appear at the end of the argument list (all non-optional arguments go first)
92
Copyright (C) 2008, http://www.dabeaz.com 3-
Calling a Function
• Consider a simple function
prices = read_prices("prices.csv",",")
• Calling with "positional" args
17
• This places values into the arguments in the same order as they are listed in the function definition (by position)
def read_prices(filename,delimiter):
...
Copyright (C) 2008, http://www.dabeaz.com 3-
Keyword Arguments
prices = read_prices(filename="price.csv",
delimiter=",")
• Calling with "keyword" arguments
18
• Here, you explicitly name and assign a value to each argument
• Common use case : functions with many optional features
def sort(data,reverse=False,key=None):
statements
sort(s,reverse=True)
93
Copyright (C) 2008, http://www.dabeaz.com 3-
Mixed Arguments
read_prices("prices.csv",delimiter=',')
• Positional and keyword arguments can be mixed together in a function call
19
• The positional arguments must go first.
• Basic rules:
• All required arguments get values
• No duplicate argument values
Copyright (C) 2008, http://www.dabeaz.com 3-
Design Tip
• Always give short, but meaningful names to function arguments
• Someone using a function may want to use the keyword calling style
20
d = read_prices("prices.csv", delimiter=",")
• Python development tools will show the names in help features and documentation
94
Copyright (C) 2008, http://www.dabeaz.com 3-
Design Tip
• Don't write functions that take a huge number of input parameters
• You are not going to remember how to call your 20-argument function (nor will you probably want to).
• Functions should only take a few inputs (fewer than 4 as a rule of thumb)
21
Copyright (C) 2008, http://www.dabeaz.com 3-
Design Tip• Don't transform function inputs
22
def read_data(filename):
f = open(filename)
for line in f:
...
• It limits the flexibility of the function
• Here is a better versiondef read_data(f):
for line in f:
...
# Sample use - pass in a different kind of file
import gzip
f = gzip.open("portfolio.gz")
portfolio = read_data(f)
Transforms filename into an open file
95
Copyright (C) 2008, http://www.dabeaz.com 3-
Return Values
• return statement returns a value
def square(x):
return x*x
23
• If no return value, None is returneddef bar(x):
statements
return
a = bar(4) # a = None
• Return value is discarded if not assigned/usedsquare(4) # Calls square() but discards result
Copyright (C) 2008, http://www.dabeaz.com 3-
Multiple Return Values
• A function may return multiple values by returning a tupledef divide(a,b):
q = a // b # Quotient
r = a % b # Remainder
return q,r # Return a tuple
24
• Usage examples:
x = divide(37,5) # x = (7,2)
x,y = divide(37,5) # x = 7, y = 2
96
Copyright (C) 2008, http://www.dabeaz.com 3-
Side Effects• When you call a function, the arguments
are simply names for passed values
• If mutable data types are passed (e.g., lists, dicts), they can be modified "in-place"
25
def square_list(s):
for i in xrange(len(s)):
s[i] = s[i]*s[i]
a = [1, 2, 3]
square_list(a)
print a # [1, 4, 9]
Modifies the input object
• Such changes are an example of "side effects"
Copyright (C) 2008, http://www.dabeaz.com 3-
Design Tip
• Don't modify function inputs
• When data moves around around in a program, it always does so by reference.
• If you modify something, it may affect other parts of the program you didn't expect
• And it will be really hard to debug
• Caveat: It depends on the problem
26
97
Copyright (C) 2008, http://www.dabeaz.com 3-
Understanding Variables
• Programs assign values to variables
• Variable assignments occur outside and inside function definitions
• Variables defined outside are "global"
• Variables inside a function are "local"
27
x = value # Global variable
def foo():
y = value # Local variable
Copyright (C) 2008, http://www.dabeaz.com 3-
Local Variables• Variables inside functions are private
• Values not retained or accessible after return>>> stocks = read_portfolio("stocks.dat")
>>> fields
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'fields' is not defined
>>>
28
def read_portfolio(filename):
portfolio = []
for line in open(filename):
fields = line.split()
s = (fields[0],int(fields[1]),float(fields[2]))
portfolio.append(s)
return portfolio
• Don't conflict with variables found elsewhere
98
Copyright (C) 2008, http://www.dabeaz.com 3-
Global Variables• Functions can access the values of globals
• This is a useful technique for dealing with general program settings and other details
• However, you don't want to go overboard (otherwise the program becomes fragile)
29
delimiter = ','
def read_portfolio(filename):
...
for line in open(filename):
fields = line.split(delimiter)
...
Copyright (C) 2008, http://www.dabeaz.com 3-
Modifying Globals• One quirk: Functions can't modify globals
30
delimiter = ','
def set_delimiter(newdelimiter):
delimiter = newdelimiter
• Example:
>>> delimiter
','
>>> set_delimiter(':')
>>> delimiter
','
>>>Notice no change
• All assignments in functions are local
99
Copyright (C) 2008, http://www.dabeaz.com 3-
Modifying Globals
• If you want to modify a global variable you must declare it as such in the function
delimiter = ','
def set_delimiter(newdelimiter):
global delimiter
delimiter = newdelimiter
• global declaration must appear before use
• Only necessary for globals that will be modified (globals are already readable)
31
Copyright (C) 2008, http://www.dabeaz.com 3-
Design Tip
• If you use the global declaration a lot, you are probably making things difficult
• It's generally a bad idea to write programs that modify lots of global variables
• Better off using a user defined class (later)
32
100
Copyright (C) 2008, http://www.dabeaz.com 3-
Exercise 3.2
33
Time : 20 minutes
Copyright (C) 2008, http://www.dabeaz.com 3-
More on Functions
• There are additional details
• Variable argument functions
• Error checking
• Callback functions
• Anonymous functions
• Documentation strings
34
101
Copyright (C) 2008, http://www.dabeaz.com 3-
Variable Arguments
• Function that accepts any number of args
def foo(x,*args):
...
• Here, the arguments get passed as a tuple
foo(1,2,3,4,5)
def foo(x,*args):
(2,3,4,5)
35
1
Copyright (C) 2008, http://www.dabeaz.com 3-
Variable Arguments
• Example:
def print_headers(*headers):
for h in headers:
print "%10s" % h,
print ("-"*10 + " ")*len(headers)
• Usage:
36
>>> print_headers('Name','Shares','Price')
Name Shares Price
---------- ---------- ----------
>>>
102
Copyright (C) 2008, http://www.dabeaz.com 3-
Variable Arguments
• Function that accepts any keyword argsdef foo(x,y,**kwargs):
...
• Extra keywords get passed in a dict
foo(2,3,flag=True,mode="fast",header="debug")
def foo(x,y,**kwargs):
...
{ 'flag' : True,
'mode' : 'fast',
'header' : 'debug' }
37
Copyright (C) 2008, http://www.dabeaz.com 3-
Variable Keyword Use• Accepting any number of keyword args is
useful when there is a lot of complicated configuration featuresdef plot_data(data, **opts):
color = opts.get('color','black')
symbols = opts.get('symbol',None)
...
plot_data(x,
color="blue",
bgcolor="white",
title="Energy",
xmin=-4.0, xmax=4.0,
ymin=-10.0, ymax=10.0,
xaxis_title="T",
yaxis_title="KE",
...
)38
103
Copyright (C) 2008, http://www.dabeaz.com 3-
Variable Arguments
• A function that takes any argumentsdef foo(*args,**kwargs):
statements
• This will accept any combination of positional or keyword arguments
• Sometimes used when writing wrappers or when you want to pass arguments through to another function
39
Copyright (C) 2008, http://www.dabeaz.com 3-
Passing Tuples and Dicts
• Tuples can be expand into function args
args = (2,3,4)
foo(1, *args) # Same as foo(1,2,3,4)
• Dictionaries can expand to keyword args
40
kwargs = {
'color' : 'red',
'delimiter' : ',',
'width' : 400 }
foo(data, **kwargs)
# Same as foo(data,color='red',delimiter=',',width=400)
• These are not commonly used except when writing library functions.
104
Copyright (C) 2008, http://www.dabeaz.com 3-
Error Checking
• Python performs no checking or validation of function argument types or values
• A function will work on any data that is compatible with the statements in the function
def add(x,y):
return x + y
add(3,4) # 7
add("Hello","World") # "HelloWorld"
add([1,2],[3,4]) # [1,2,3,4]
• Example:
41
Copyright (C) 2008, http://www.dabeaz.com 3-
Error Checking• If there are errors in a function, they will
show up at run time (as an exception)
def add(x,y):
return x+y
>>> add(3,"hello")
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for +:
'int' and 'str'
>>>
• Example:
42
• To verify code, there is a strong emphasis on testing (covered later)
105
Copyright (C) 2008, http://www.dabeaz.com 3-
Error Checking
• Python also performs no checking of function return values
• Inconsistent use does not result in an error
def foo(x,y):
if x:
return x + y
else:
return
• Example:
43
Inconsistent useof return
(not checked)
Copyright (C) 2008, http://www.dabeaz.com 3-
Callback Functions
• Sometimes functions are written to rely on a user-supplied function that performs some kind of special processing.
• Example: Case-insensitive sortdef lowerkey(s):
return s.lower()
names.sort(key=lowerkey)
• Here, the sort() operation calls the lowerkey function as part of its processing (a callback).
44
106
Copyright (C) 2008, http://www.dabeaz.com 3-
Anonymous Functions
• lambda statement
names.sort(key=lambda s: s.lower())
• Creates a function that evaluates an expressionlowerkey = lambda s: s.lower()
# Same as
def lowerkey(s): return s.lower()
• Since callbacks are often short expressions, there is a shortcut syntax for it
45
• lambda is highly restricted. It can only be a single Python expression.
Copyright (C) 2008, http://www.dabeaz.com 3-
Documentation Strings• First line of function may be string
def divide(a,b):
"""Divides a by b and returns a quotient and
remainder. For example:
>>> divide(9/5)
(1,4)
>>> divide(15,4)
(3,3)
>>>
"""
q = a // b
r = a % b
return (q,r)
• Including a docstring is considered good style
• Why?
46
107
Copyright (C) 2008, http://www.dabeaz.com 3-
Docstring Benefits
• Online help>>> help(divide)
Help on function divide in module foo:
divide(a, b)
Divides a by b and returns a quotient and
remainder. For example:
>>> divide(9/5)
(1,4)
>>> divide(15,4)
(3,3)
>>>
• Many IDEs/tools look at docstrings
47
Copyright (C) 2008, http://www.dabeaz.com 3-
Docstring Benefits
• Testing
• Testing tools look at interaction session
• Example: doctest module (later)
def divide(a,b):
"""Divides a by b and returns a quotient and
remainder. For example:
>>> divide(9/5)
(1,4)
>>> divide(15,4)
(3,3)
>>>
"""
48
108
Copyright (C) 2008, http://www.dabeaz.com 3-
Exercise 3.3
49
Time : 10 minutes
Copyright (C) 2008, http://www.dabeaz.com 3-
Exceptions
• Used to signal errors
• Catching an exception (try)
• Raising an exception (raise)
if name not in names:
raise RuntimeError("Name not found")
try:
authenticate(username)
except RuntimeError,e:
print e
50
109
Copyright (C) 2008, http://www.dabeaz.com 3-
Exceptions• Exceptions propagate to first matching except
def foo():
try:
bar()
except RuntimeError,e:
...
def bar():
try:
spam()
except RuntimeError,e:
...
def spam():
grok()
def grok():
...
raise RuntimeError("Whoa!")
51
Copyright (C) 2008, http://www.dabeaz.com 3-
Builtin-Exceptions• About two-dozen built-in exceptions
ArithmeticError
AssertionError
EnvironmentError
EOFError
ImportError
IndexError
KeyboardInterrupt
KeyError
MemoryError
NameError
ReferenceError
RuntimeError
SyntaxError
SystemError
TypeError
ValueError
52
• Consult reference
110
Copyright (C) 2008, http://www.dabeaz.com 3-
Exception Values• Most exceptions have an associated value
• More information about what's wrongraise RuntimeError("Invalid user name")
• Passed to variable supplied in excepttry:
...
except RuntimeError,e:
...
• It's an instance of the exception type, but often looks like a string
except RuntimeError,e:
print "Failed : Reason", e
53
Copyright (C) 2008, http://www.dabeaz.com 3-
Catching Multiple Errors• Can catch different kinds of exceptions
try:
...
except LookupError,e:
...
except RuntimeError,e:
...
except IOError,e:
...
except KeyboardInterrupt,e:
...
• Alternatively, if handling is sametry:
...
except (IOError,LookupError,RuntimeError),e:
...
54
111
Copyright (C) 2008, http://www.dabeaz.com 3-
Catching All Errors
• Catching any exception
try:
...
except Exception:
print "An error occurred"
• Ignoring an exception (pass)try:
...
except RuntimeError:
pass
55
Copyright (C) 2008, http://www.dabeaz.com 3-
Exploding Heads
• The wrong way to use exceptions:
try:
go_do_something()
except Exception:
print "Couldn't do something"
• This swallows all possible errors that might occur.
• May make it impossible to debug if code is failing for some reason you didn't expect at all (e.g., uninstalled Python module, etc.)
56
112
Copyright (C) 2008, http://www.dabeaz.com 3-
A Better Approach
• This is a somewhat more sane approach
try:
go_do_something()
except Exception, e:
print "Couldn't do something. Reason : %s\n" % e
57
• Reports a specific reason for the failure
• It is almost always a good idea to have some mechanism for viewing/reporting errors if you are writing code that catches all possible exceptions
Copyright (C) 2008, http://www.dabeaz.com 3-
finally statement
• Specifies code that must run regardless of whether or not an exception occurs
lock = Lock()
...
lock.acquire()
try:
...
finally:
lock.release() # release the lock
• Commonly use to properly manage resources (especially locks, files, etc.)
58
113
Copyright (C) 2008, http://www.dabeaz.com 3-
Program Exit• Program exit is handle through exceptions
raise SystemExit
raise SystemExit(exitcode)
• An alternative sometimes seenimport sys
sys.exit()
sys.exit(exitcode)
59
• Catching keyboard interrupt (Control-C)
try:
statements
except KeyboardInterrupt:
statements
Copyright (C) 2008, http://www.dabeaz.com 3-
Exercise 3.4
60
Time : 10 minutes
114
Modules and LibrariesSection 4
Copyright (C) 2008, http://www.dabeaz.com 4-
Overview
• How to place code in a module
• Useful applications of modules
• Some common standard library modules
• Installing third party libraries
2
115
Copyright (C) 2008, http://www.dabeaz.com 4-
Modules
• Any Python source file is a module
• import statement loads and executes a module
# foo.py
def grok(a):
...
def spam(b):
...
import foo
a = foo.grok(2)
b = foo.spam("Hello")
...
3
Copyright (C) 2008, http://www.dabeaz.com 4-
Namespaces
• A module is a collection of named values
• It's said to be a "namespace"
• The names correspond to global variables and functions defined in the source file
• You use the module name to access
>>> import foo
>>> foo.grok(2)
>>>
4
• Module name is tied to source (foo -> foo.py)
116
Copyright (C) 2008, http://www.dabeaz.com 4-
Module Execution
• When a module is imported, all of the statements in the module execute one after another until the end of the file is reached
• If there are scripting statements that carry out tasks (printing, creating files, etc.), they will run on import
• The contents of the module namespace are all of the global names that are still defined at the end of this execution process
5
Copyright (C) 2008, http://www.dabeaz.com 4-
Globals Revisited
• Everything defined in the "global" scope is what populates the module namespace
# foo.py
x = 42
def grok(a):
...
6
• Different modules can use the same names and those names don't conflict with each other (modules are isolated)
# bar.py
x = 37
def spam(a):
...
These definitions of x are different
117
Copyright (C) 2008, http://www.dabeaz.com 4-
Running as "Main"• Source files can run in two "modes"
7
• Sometimes you want to check
if __name__ == '__main__':
print "Running as the main program"
else:
print "Imported as a module using import"
shell % python foo.py # Running as main
import foo # Loaded as a module
• __name__ is the name of the module
• The main program (and interactive interpreter) module is __main__
Copyright (C) 2008, http://www.dabeaz.com 4-
Module Loading
• Each module loads and executes once
• Repeated imports just return a reference to the previously loaded module
• sys.modules is a dict of all loaded modules
8
>>> import sys
>>> sys.modules.keys()
['copy_reg', '__main__', 'site', '__builtin__',
'encodings', 'encodings.encodings', 'posixpath', ...]
>>>
118
Copyright (C) 2008, http://www.dabeaz.com 4-
Locating Modules
• When looking for modules, Python first looks in the current working directory of the main program
• If a module can't be found there, an internal module search path is consulted
>>> import sys
>>> sys.path
['', '/Library/Frameworks/
Python.framework/Versions/2.5/lib/
python25.zip', '/Library/Frameworks/
Python.framework/Versions/2.5/lib/
python2.5', ...
]
9
Copyright (C) 2008, http://www.dabeaz.com 4-
Module Search Path
• sys.path contains search path
• Paths also added via environment variables
• Can manually adjust if you need toimport sys
sys.path.append("/project/foo/pyfiles")
% env PYTHONPATH=/project/foo/pyfiles python
Python 2.4.3 (#1, Apr 7 2006, 10:54:33)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
>>> import sys
>>> sys.path
['','/project/foo/pyfiles', '/Library/Frameworks ...]
10
119
Copyright (C) 2008, http://www.dabeaz.com 4-
Import Process
• Import looks for specific kinds of filesname.pyc # Compiled Python
name.pyo # Compiled Python (optimized)
name.py # Python source file
name.dll # Dynamic link library (C/C++)
• If module is a .py file, it is first compiled into bytecode and a .pyc/.pyo file is created
• .pyc/.pyo files regenerated as needed should the original source file change
11
Copyright (C) 2008, http://www.dabeaz.com 4-
Exercise 4.1
12
Time : 15 Minutes
120
Copyright (C) 2008, http://www.dabeaz.com 4-
Modules as Objects
• When you import a module, the module itself is a kind of "object"
• You can assign it to variables, place it in lists, change it's name, and so forth
import math
m = math # Assign to a variable
x = m.sqrt(2) # Access through the variable
13
• You can even store new things in it
math.twopi = 2*math.pi
Copyright (C) 2008, http://www.dabeaz.com 4-
What is a Module?
• A module is just a thin-layer over a dictionary (which holds all of the contents)
>>> import foo
>>> foo.__dict__.keys()
['grok','spam','__builtins__','__file__',
'__name__', '__doc__']
>>> foo.__dict__['grok']
<function grok at 0x69230>
>>> foo.grok
<function grok at 0x69230>
>>>
14
• Any time you reference something in a module, it's just a dictionary lookup
121
Copyright (C) 2008, http://www.dabeaz.com 4-
import as statement
• This is identical to import except that a different name is used for the module object
• The new name only applies to this one source file (other modules can still import the library using its original name)
• Changing the name of the loaded module
# bar.py
import fieldparse as fp
fields = fp.split(line)
15
Copyright (C) 2008, http://www.dabeaz.com 4-
import as statement• There are other practical uses of changing
the module name on import
• It makes it easy to load and seamlessly use different versions of a library module
if debug:
import foodebug as foo # Load debugging version
else:
import foo
foo.grok(2) # Uses whatever module was loaded
16
• You can make a program extensible by simply importing a plugin module as a common name
122
Copyright (C) 2008, http://www.dabeaz.com 4-
from module import
• Selected symbols can be lifted out of a module and placed into the caller's namespace
# bar.py
from foo import grok
grok(2)
17
• This is useful if a name is used repeatedly and you want to reduce the amount of typing
• And, it also runs faster if it gets used a lot
Copyright (C) 2008, http://www.dabeaz.com 4-
from module import *• Takes all symbols from a module and places
them into the caller's namespace
# bar.py
from foo import *
grok(2)
spam("Hello")
...
18
• However, it only applies to names that don't start with an underscore (_)
• _name often used when defining non-imported values in a module.
123
Copyright (C) 2008, http://www.dabeaz.com 4-
from module import *
• Using this form of import requires some care and attention
• It will overwrite any previously defined objects that share the same name as an imported symbol
• It may pollute the namespace with symbols that you don't really want (or expect)
19
Copyright (C) 2008, http://www.dabeaz.com 4-
Commentary
• As a general rule, it is better to make proper use of namespaces and to use the normal import statement
20
import foo
• It is more "pythonic" and reduces the amount of clutter in your code.
"Namespaces are one honking great idea -- let's do more of those!" - import this
124
Copyright (C) 2008, http://www.dabeaz.com 4-
Exercise 4.2
21
Time : 15 Minutes
Copyright (C) 2008, http://www.dabeaz.com 4-
Standard Library
• Python includes a large standard library
• Several hundred modules
• System, networking, data formats, etc.
• All accessible via import
• A quick tour of most common modules
22
125
Copyright (C) 2008, http://www.dabeaz.com 4-
Builtin Functions
• About 70 functions/objects
• Always available in global namespace
• Contained in a module __builtins__
• Have seen many of these functions already
23
Copyright (C) 2008, http://www.dabeaz.com 4-
Builtins: Math• Built-in mathematical functions
abs(x)
divmod(x,y)
max(s1, s2, ..., sn)
min(s1, s2, ..., sn)
pow(x,y [,z])
round(x, [,n])
sum(s [,initial])
• Examples>>> max(2,-10,40,7)
40
>>> min([2,-10,40,7])
-10
>>> sum([2,-10,40,7],0)
39
>>> round(3.141592654,2)
3.1400000000000001
>>>
24
126
Copyright (C) 2008, http://www.dabeaz.com 4-
Builtins: Conversions• Conversion functions and types
str(s) # Convert to string
repr(s) # String representation
int(x [,base]) # Integer
long(x [,base]) # Long
float(x) # Float
complex(x) # Complex
hex(x) # Hex string
oct(x) # Octal string
chr(n) # Character from number
ord(c) # Character code
• Examples>>> hex(42)
'0x2a'
>>> ord('c')
99
>>> chr(65)
'A'
>>>
25
Copyright (C) 2008, http://www.dabeaz.com 4-
Builtins: Iteration• Functions commonly used with for statement
enumerate(s)
range([i,] j [,stride])
reversed(s)
sorted(s [,cmp [,key [,reverse]]])
xrange([i,] j [,stride])
zip(s1,s2,...)
• Examples>>> x = [1,-10,4]
>>> for i in reversed(x): print i,
...
4 -10 1
>>> for i,v in enumerate(x): print i,v
0 1
1 -10
2 4
>>> for i in sorted(x): print i,
...
-10 1 4
26
127
Copyright (C) 2008, http://www.dabeaz.com 4-
sys module
• Information related to environment
• Version information
• System limits
• Command line options
• Module search paths
• Standard I/O streams
27
Copyright (C) 2008, http://www.dabeaz.com 4-
sys: Command Line Opts
• Parameters passed on Python command linesys.argv
# opt.py
import sys
print sys.argv
% python opt.py -v 3 "a test" foobar.txt
['opt.py', '-v', '3', 'a test', 'foobar.txt']
%
• Example:
28
128
Copyright (C) 2008, http://www.dabeaz.com 4-
Command Line Args
• For simple programs, can manually process
29
import sys
if len(sys.argv) == 2:
filename = sys.argv[1]
out = open(filename,"w")
else:
out = sys.stdout
• For anything more complicated, processing becomes tediously annoying
• Soln: optparse module
Copyright (C) 2008, http://www.dabeaz.com 4-
optparse Module
30
• A whole framework for processing command line arguments
• Might be overkill, but implements most of the annoying code you would have to write yourself
• A relatively recent addition to Python
• Impossible to cover all details, will show example
129
Copyright (C) 2008, http://www.dabeaz.com 4-
optparse Example
31
import optparse
p = optparse.OptionParser()
p.add_option("-d",action="store_true",dest="debugmode")
p.add_option("-o",action="store",type="string",dest="outfile")
p.add_option("--exclude",action="append",type="string",
dest="excluded")
p.set_defaults(debugmode=False,outfile=None,excluded=[])
opt, args = p.parse_args()
debugmode = opt.debugmode
outfile = opt.outfile
excluded = opt.excluded
Copyright (C) 2008, http://www.dabeaz.com 4-
sys: Standard I/O
• Standard I/O streams
sys.stdout
sys.stderr
sys.stdin
• By default, print is directed to sys.stdout
• Input read from sys.stdin
• Can redefine or use directly
sys.stdout = open("out.txt","w")
print >>sys.stderr, "Warning. Unable to connect"
32
130
Copyright (C) 2008, http://www.dabeaz.com 4-
math module
• Contains common mathematical functionsmath.sqrt(x)
math.sin(x)
math.cos(x)
math.tan(x)
math.atan(x)
math.log(x)
...
math.pi
math.e
• Example:import math
c = 2*math.pi*math.sqrt((x1-x2)**2 + (y1-y2)**2)
33
Copyright (C) 2008, http://www.dabeaz.com 4-
random Module
• Generation of random numbersrandom.randint(a,b) # Random integer a<=x<b
random.random() # Random float 0<=x<1
random.uniform(a,b) # Random float a<=x<b
random.seed(x)
• Example:>>> import random
>>> random.randint(10,20)
16
>>> random.random()
0.53763273926379385
>>> random.uniform(10,20)
13.62148074612832
>>>
34
131
Copyright (C) 2008, http://www.dabeaz.com 4-
copy Module• Use to make shallow/deep copies of objects
copy.copy(obj) # Make shallow copy
copy.deepcopy(obj) # Make deep copy
• Example:>>> a = [1,2,3]
>>> b = ['x',a]
>>> import copy
>>> c = copy.copy(b)
>>> c
['x', [1, 2, 3]]
>>> c[1] is a
True
>>> d = copy.deepcopy(b)
>>> d
['x', [1, 2, 3]]
>>> d[1] is a
False
a only copied by reference. b and c contain the same a.
a is also copied. b and d contain different lists
35
Copyright (C) 2008, http://www.dabeaz.com 4-
os Module
• Contains operating system functions
• Example: Executing a system command
>>> import os
>>> os.system("mkdir temp")
>>>
36
• Walking a directory tree>>> for path,dirs,files in os.walk("/home"):
... for name in files:
... print "%s/%s" % (path,name)
132
Copyright (C) 2008, http://www.dabeaz.com 4-
Environment Variables
• os.environ dictionary contains valuesimport os
home = os.environ['HOME']
os.environ['HOME'] = '/home/user/guest'
• Changes are reflected in Python and any subprocesses created later
• Environment variables (typically set in shell)% setenv NAME dave
% setenv RSH ssh
% python prog.py
37
Copyright (C) 2008, http://www.dabeaz.com 4-
Getting a Directory Listing
• os.listdir() function
38
>>> files = os.listdir("/some/path")
>>> files
['foo','bar','spam']
>>>
• glob module
>>> txtfiles = glob.glob("*.txt")
>>> datfiles = glob.glob("Dat[0-5]*")
>>>
• glob understands Unix shell wildcards (on all systems)
133
Copyright (C) 2008, http://www.dabeaz.com 4-
os.path Module
• Portable management of path names and files
• Examples:
>>> import os.path
>>> os.path.basename("/home/foo/bar.txt")
'bar.txt'
>>> os.path.dirname("/home/foo/bar.txt")
'/home/foo'
>>> os.path.join("home","foo","bar.txt")
'home/foo/bar.txt'
>>>
39
Copyright (C) 2008, http://www.dabeaz.com 4-
File Tests
• Testing if a filename is a regular file
40
>>> os.path.isfile("foo.txt")
True
>>> os.path.isfile("/usr")
False
>>>
• Testing if a filename is a directory>>> os.path.isdir("foo.txt")
False
>>> os.path.isdir("/usr")
True
>>>
• Testing if a file exists>>> os.path.exists("foo.txt")
True
>>>
134
Copyright (C) 2008, http://www.dabeaz.com 4-
File Metadata
• Getting the last modification/access time
41
>>> os.path.getmtime("foo.txt")
1175769416.0
>>> os.path.getatime("foo.txt")
1175769491.0
>>>
• Note: To decode times, use time module
>>> time.ctime(os.path.getmtime("foo.txt"))
'Thu Apr 5 05:36:56 2007'
>>>
• Getting the file size>>> os.path.getsize("foo.txt")
1344L
>>>
Copyright (C) 2008, http://www.dabeaz.com 4-
Shell Operations (shutil)
• Copying a file
42
>>> shutil.copy("source","dest")
• Moving a file (renaming)
>>> shutil.move("old","new")
• Copying a directory tree>>> shutil.copytree("srcdir","destdir")
• Removing a directory tree
>>> shutil.rmtree("dir")
135
Copyright (C) 2008, http://www.dabeaz.com 4-
Exercise 4.3
43
Time : 10 Minutes
Copyright (C) 2008, http://www.dabeaz.com 4-
pickle Module• A module for serializing objects
44
• Saving an object to a fileimport pickle
...
f = open("data","wb")
pickle.dump(someobj,f)
• Loading an object from a file
f = open("data","rb")
someobj = pickle.load(f)
• This works with almost all objects except for those involving system state (open files, network connections, threads, etc.)
136
Copyright (C) 2008, http://www.dabeaz.com 4-
pickle Module
• Pickle can also turn objects into strings
import pickle
# Convert to a string
s = pickle.dumps(someobj)
...
# Load from a string
someobj = pickle.loads(s)
• Useful if you want to send an object elsewhere
• Examples: Send over network, store in a database, etc.
45
Copyright (C) 2008, http://www.dabeaz.com 4-
shelve module• Used to create a persistent dictionary
import shelve
s = shelve.open("data","c")
# Put an object in the shelf
s['foo'] = someobj
# Get an object out of the shelf
someobj = s['foo']
s.close()
• Supports many dictionary operationss[key] = obj # Store an object
data = s[key] # Get an object
del s[key] # Delete an object
s.has_key(key) # Test for key
s.keys() # Return list of keys
46
137
Copyright (C) 2008, http://www.dabeaz.com 4-
shelve module
• Keys must be strings
s = shelve.open("data","w")
s['foo'] = 4 # okay
s[2] = 'Hello' # Error. bad key
• Shelve open() flagss = shelve.open("file",'c') # RW. Create if not exist
s = shelve.open("file",'r') # Read-only
s = shelve.open("file",'w') # Read-write
s = shelve.open("file",'n') # RW. Force creation
• Stored values may be any object compatible with the pickle module
47
Copyright (C) 2008, http://www.dabeaz.com 4-
ConfigParser Module• A module that can read parameters out of
configuration files based on the .INI format
48
# A Comment
; A Comment (alternative)
[section1]
var1 = 123
var2 = abc
[section2]
var1 = 456
var2 = def
...
• File consists of named sections each with a series of variable declarations
138
Copyright (C) 2008, http://www.dabeaz.com 4-
Parsing INI Files
• Creating and parsing a configuration file
49
import ConfigParser
cfg = ConfigParser.ConfigParser()
cfg.read("config.ini")
• Retrieving the value of an option
v = cfg.get("section","varname")
• This returns the value as a string or raises an exception if not found.
Copyright (C) 2008, http://www.dabeaz.com 4-
Parsing Example• Sample .INI file
50
; simple.ini
[files]
infile = stocklog.dat
outfile = summary.dat
[web]
host = www.python.org
port = 80
• Interactive session>>> cfg = ConfigParser.ConfigParser()
>>> cfg.read("simple.ini")
['simple.ini']
>>> cfg.get("files","infile")
'stocklog.dat'
>>> cfg.get("web","host")
'www.python.org'
>>>
139
Copyright (C) 2008, http://www.dabeaz.com 4-
Tkinter• A Library for building simple GUIs
>>> from Tkinter import *
>>> def pressed():
... print "You did it!"
>>> b = Button(text="Do it!",command=pressed)
>>> b.pack()
>>> b.mainloop()
51
• Clicking on the button....
You did it!
You did it!
...
Copyright (C) 2008, http://www.dabeaz.com 4-
More on Tkinter
• The only GUI packaged with Python itself
• Based on Tcl/Tk. Popular open-source scripting language/GUI widget set developed by John Ousterhout (90s)
• Tk used in a wide variety of other languages (Perl, Ruby, etc.)
• Cross-platform (Unix/Windows/MacOS)
• It's small (~25 basic widgets)
52
140
Copyright (C) 2008, http://www.dabeaz.com 4-
Sample Tkinter Widgets
53
Copyright (C) 2008, http://www.dabeaz.com 4-
Tkinter Programming
• GUI programming is a big topic
• Frankly, it's mostly a matter of looking at the docs (google 'Tkinter' for details)
• Execution model is based entirely on an event loop with callback functions
>>> from Tkinter import *
>>> def pressed():
... print "You did it!"
>>> b = Button(text="Do it!",command=pressed)
>>> b.pack()
>>> b.mainloop()
54
Callback
Run the event loop
141
Copyright (C) 2008, http://www.dabeaz.com 4-
Tkinter Commentary
• Your mileage with Tkinter will vary
• It comes with Python, but it's also dated
• And it's a lot lower-level than many like
• Some other GUI toolkits:
• wxPython
• PyQT
55
Copyright (C) 2008, http://www.dabeaz.com 4-
Exercise 4.4
56
Time : 30 Minutes
142
Copyright (C) 2008, http://www.dabeaz.com 4-
Useful New Datatypes
• The standard library also contains new kinds of datatypes useful for certain kinds of problems
• Examples:
• Decimal
• datetime
• deque
• Will briefly illustrate
57
Copyright (C) 2008, http://www.dabeaz.com 4-
decimal module• A module that implements accurate base-10
decimals of arbitrary precision
58
>>> from decimal import Decimal
>>> x = Decimal('3.14')
>>> y = Decimal('5.2')
>>> x + y
Decimal('8.34')
>>> x / y
Decimal('0.6038461538461538461538461538')
>>>
• Controlling the precision>>> from decimal import getcontext
>>> getcontext().prec = 2
>>> x / y
Decimal('0.60')
>>>
143
Copyright (C) 2008, http://www.dabeaz.com 4-
datetime module
• A module for representing and manipulating dates and times
59
>>> from datetime import datetime
>>> inauguration = datetime(2009,1,20)
>>> inauguration
datetime.datetime(2009, 1, 20, 0, 0)
>>> today = datetime.today()
>>> d = inauguration - today
>>> d
datetime.timedelta(92, 54189, 941608)
>>> d.days
92
>>>
• There are many more features not shown
Copyright (C) 2008, http://www.dabeaz.com 4-
collections module
• A module with a few useful data structures
• collections.deque - A doubled ended queue
60
>>> from collections import deque
>>> q = deque()
>>> q.append("this")
>>> q.appendleft("that")
>>> q
deque(['that', 'this'])
>>>
• A deque is like a list except that it's highly optimized for insertion/deletion on both ends
• Implemented in C and finely tuned
144
Copyright (C) 2008, http://www.dabeaz.com 4-
Standardized APIs
• API (Application Programming Interface)
• Some parts of the Python library are focused on providing a common programming interface for accessing third-party systems
• Example : The Python Database API
• A standardized way to access relational database systems (Oracle, Sybase, MySQL, etc.)
61
Copyright (C) 2008, http://www.dabeaz.com 4-
Database Example• Connecting to a DB, executing a query
62
from somedbmodule import connect
connection = connect("somedb",
user="dave",password="12345")
cursor = connection.cursor()
cursor.execute("select name,address from people")
for row in cursor:
print row
connection.close()
• Output('Elwood','1060 W Addison')
('Jack','4402 N Broadway')
...
• It's actually not much more than this...
145
Copyright (C) 2008, http://www.dabeaz.com 4-
Forming Queries
• There is one tricky bit concerning the formation of query strings
select address from people where name='Elwood'
63
• As a general rule, you should not use Python string operators to create queries
query = """select address from people where
name='%s'""" % (name,)
• Leaves code open to an SQL injection attack
Copyright (C) 2008, http://www.dabeaz.com 4-
Forming Queries
select address from people where name='Elwood'
64
(http://xkcd.com/327/)
146
Copyright (C) 2008, http://www.dabeaz.com 4-
Value Substitutions• DB modules also provide their own
mechanism for substituting values in queries
65
cur.execute("select address from people where name=?",
(name,))
cur.execute("select address from people where name=?
and state=?",(name,state))
• This is somewhat similar to string formatting
• Unfortunately, the exact mechanism varies slightly by DB module (must consult docs)
Copyright (C) 2008, http://www.dabeaz.com 4-
Exercise 4.5
66
147
Copyright (C) 2008, http://www.dabeaz.com 4-
Third Party Modules
• Python has a large library of built-in modules ("batteries included")
• There are even more third party modules
• Python Package Index (PyPi)
67
http://pypi.python.org/
• Google(Fill in with whatever you're looking for)
Copyright (C) 2008, http://www.dabeaz.com 4-
Installing Modules
• Installation of a module is likely to take three different forms (depends on the module)
• Platform-native installer
• Manual Installation
• Python Eggs
68
148
Copyright (C) 2008, http://www.dabeaz.com 4-
Platform Native Install
• You downloaded a third party module as an .exe (PC) or .dmg (Mac) file
• Just run this file and follow installation instructions like you do for installing other software
• You are only likely to see these installers for the more major third-party extensions
69
Copyright (C) 2008, http://www.dabeaz.com 4-
Manual Installation
• You downloaded a Python module or package using a standard file format such as a .gz, .tar.gz, .tgz, .bz2, or .zip file
• Unpack the file and look for a setup.py file in the resulting folder
• Run Python on that setup.py file
70
% python setup.py install
Installation messages
...
%
149
Copyright (C) 2008, http://www.dabeaz.com 4-
Python Eggs• An emerging packaging format for Python
modules based on setuptools
• Currently requires a third-party install
71
http://pypi.python.org/pypi/setuptools
• Adds a command-line tool easy_install
% easy_install packagename
(or)
% easy_install packagename-1.2.3-py2.5.egg
• In theory, this is just "automatic"
Copyright (C) 2008, http://www.dabeaz.com 4-
Commentary
• Installing third party modules is always a delicate matter
• More advanced modules may involve C/C++ code which has to be compiled to native code on your platform.
• May have dependencies on other modules
• In theory, Eggs are supposed to solve these problems.... in theory.
72
150
Copyright (C) 2008, http://www.dabeaz.com 4-
Summary
• Have looked at module/package mechanism
• Some of the very basic built-in modules
• How to install third party modules
• We will focus on more of the built-in modules later in the course
73
151
Classes and Objects
Section 5
Copyright (C) 2008, http://www.dabeaz.com 5-
Overview
• How to define new objects
• How to customize objects (inheritance)
• How to combine objects (composition)
• Python special methods and customization features
2
152
Copyright (C) 2008, http://www.dabeaz.com 5-
OO in a Nutshell
• A programming technique where code is organized as a collection of "objects"
• An "object" consists of
• Data (attributes)
• Methods (functions applied to object)
• Example: A "Circle" object
• Data: radius
• Methods: area(), perimeter()
3
Copyright (C) 2008, http://www.dabeaz.com 5-
The class statement
• A definition for a new object
class Circle(object): def __init__(self,radius): self.radius = radius def area(self): return math.pi*(self.radius**2) def perimeter(self): return 2*math.pi*self.radius
• What is a class?
• It's a collection of functions that perform various operations on "instances"
4
153
Copyright (C) 2008, http://www.dabeaz.com 5-
Instances• Created by calling the class as a function
• Each instance has its own data
>>> c.area()50.26548245743669>>> d.perimeter()31.415926535897931>>>
5
>>> c = Circle(4.0)>>> d = Circle(5.0)>>>
>>> c.radius4.0>>> d.radius5.0>>>
• You invoke methods on instances to do things
Copyright (C) 2008, http://www.dabeaz.com 5-
__init__ method
• This method initializes a new instance
• Called whenever a new object is created>>> c = Circle(4.0)
class Circle(object): def __init__(self,radius): self.radius = radius
newly created object
• __init__ is example of a "special method"
• Has special meaning to Python interpreter
6
154
Copyright (C) 2008, http://www.dabeaz.com 5-
Instance Data
• Each instance has its own data (attributes)class Circle(object): def __init__(self,radius): self.radius = radius
• In other code, you just use the variable that you're using to name the instance
• Inside methods, you refer to this data using selfdef area(self): return math.pi*(self.radius**2)
>>> c = Circle(4.0)>>> c.radius4.0
7
Copyright (C) 2008, http://www.dabeaz.com 5-
Methods• Functions applied to instances of an object
class Circle(object): ... def area(self): return math.pi*self.radius**2
• By convention, the instance is called "self"
• The object is always passed as first argument
>>> c.area()
def area(self): ...
The name is unimportant---the object is always passed as the first argument. It is simply Python programming style to call this argument "self." C++ programmers might prefer to call it "this."
8
155
Copyright (C) 2008, http://www.dabeaz.com 5-
Calling Other Methods
• Methods call other methods via self
class Circle(object): def area(self): return math.pi*self.radius**2 def print_area(self): print self.area()
9
• A caution : Code like this doesn't work
class Circle(object): ... def print_area(self): print area() # ! Error
• This merely calls a global function area()
Copyright (C) 2008, http://www.dabeaz.com 5-
Exercise 5.1
10
Time : 15 Minutes
156
Copyright (C) 2008, http://www.dabeaz.com 5-
Inheritance
• A tool for specializing objects
class Parent(object): ...
class Child(Parent): ...
• New class called a derived class or subclass
• Parent known as base class or superclass
• Parent is specified in () after class name
11
Copyright (C) 2008, http://www.dabeaz.com 5-
Inheritance
• What do you mean by "specialize?"
• Take an existing class and ...
• Add new methods
• Redefine some of the existing methods
• Add new attributes to instances
12
157
Copyright (C) 2008, http://www.dabeaz.com 5-
Inheritance Example
• In bill #246 of the 1897 Indiana General Assembly, there was text that dictated a new method for squaring a circle, which if adopted, would have equated ! to 3.2.
• Fortunately, it was never adopted because an observant mathematician took notice...
• But, let's make a special Indiana Circle anyways...
13
Copyright (C) 2008, http://www.dabeaz.com 5-
Inheritance Example• Specializing a class
class INCircle(Circle): def area(self): return 3.2*self.radius**2
• Using the specialized version>>> c = INCircle(4.0) # Calls Circle.__init__>>> c.radius4.0>>> c.area() # Calls INCircle.area51.20>>> c.perimeter() # Calls Circle.perimeter25.132741228718345>>>
14
• It's the same as Circle except for area()
158
Copyright (C) 2008, http://www.dabeaz.com 5-
Using Inheritance
• Inheritance often used to organize objects
class Shape(object): ...
class Circle(Shape): ...
class Rectangle(Shape): ...
• Think of a logical hierarchy or taxonomy
15
Copyright (C) 2008, http://www.dabeaz.com 5-
object base class
• If a class has no parent, use object as base
class Foo(object): ...
• object is the parent of all objects in Python
• Note : Sometimes you will see code where classes are defined without any base class. That is an older style of Python coding that has been deprecated for almost 10 years. When defining a new class, you always inherit from something.
16
159
Copyright (C) 2008, http://www.dabeaz.com 5-
Inheritance and __init__
• With inheritance, you must initialize parents
class Shape(object): def __init__(self): self.x = 0.0 self.y = 0.0 ...class Circle(Shape): def __init__(self,radius): Shape.__init__(self) # init base
self.radius = radius
17
Copyright (C) 2008, http://www.dabeaz.com 5-
Inheritance and methods
• Calling the same method in a parent
class Foo(object): def spam(self): ... ...class Bar(Foo): def spam(self): ... r = Foo.spam(self)
...
18
• Useful if subclass only makes slight change to method in base class (or wraps it)
160
Copyright (C) 2008, http://www.dabeaz.com 5-
Calling Other Methods
• With inheritance, the correct method gets called if overridden (depends on the type of self)
class Circle(object): def area(self): return math.pi*self.radius**2 def print_area(self): print self.area()
class INCircle(Circle): def area(self): return 3.2*self.radius**2
19
if self is an instance of INCircle
• Example:>>> c = INCircle(4)>>> c.print_area()51.2>>>
Copyright (C) 2008, http://www.dabeaz.com 5-
Multiple Inheritance
• You can specifying multiple base classes
class Foo(object): ...class Bar(object): ...class Spam(Foo,Bar): ...
• The new class inherits features from both parents
• But there are some really tricky details (later)
• Rule of thumb : Avoid multiple inheritance
20
161
Copyright (C) 2008, http://www.dabeaz.com 5-
Exercise 5.2
21
Time : 30 Minutes
Copyright (C) 2008, http://www.dabeaz.com 5-
Special Methods
• Classes may define special methods
• Have special meaning to Python interpreter
• Always preceded/followed by __class Foo(object): def __init__(self): ... def __del__(self): ...
• There are several dozen special methods
• Will show a few examples
22
162
Copyright (C) 2008, http://www.dabeaz.com 5-
Methods: String Conv.• Converting object into string representation
str(x) __str__(x)repr(x) __repr__(x)
class Foo(object): def __str__(self): s = "some string for Foo" return s def __repr__(self): s = "Foo(args)"
• __str__ used by print statement
• __repr__ used by repr() and interactive mode
Note: The convention for __repr__() is to return a string that, when fed to eval() , will recreate the underlying object. If this is not possible, some kind of easily readable representation is used instead.
23
Copyright (C) 2008, http://www.dabeaz.com 5-
Methods: Item Access
• Methods used to implement containerslen(x) __len__(x)x[a] __getitem__(x,a)x[a] = v __setitem__(x,a,v)del x[a] __delitem__(x,a)
• Use in a classclass Foo(object): def __len__(self): ... def __getitem__(self,a): ... def __setitem__(self,a,v): ... def __delitem__(self,a): ...
24
163
Copyright (C) 2008, http://www.dabeaz.com 5-
Methods: Containment
• Containment operatorsa in x __contains__(x,a)b not in x not __contains__(x,b)
• Example:class Foo(object): def __contains__(self,a): # if a is in self, return True # otherwise, return False)
25
Copyright (C) 2008, http://www.dabeaz.com 5-
Methods: Mathematics• Mathematical operators
a + b __add__(a,b)a - b __sub__(a,b)a * b __mul__(a,b)a / b __div__(a,b)a // b __floordiv__(a,b)a % b __mod__(a,b)a << b __lshift__(a,b)a >> b __rshift__(a,b)a & b __and__(a,b)a | b __or__(a,b)a ^ b __xor__(a,b)a ** b __pow__(a,b)-a __neg__(a)~a __invert__(a)abs(a) __abs__(a)
• Consult reference for further details
26
164
Copyright (C) 2008, http://www.dabeaz.com 5-
Odds and Ends
• Defining new exceptions
• Bound and unbound methods
• Alternative attribute lookup
27
Copyright (C) 2008, http://www.dabeaz.com 5-
Defining Exceptions
• User-defined exceptions are defined by classes
• Exceptions always inherit from Exception
class NetworkError(Exception): pass
• For simple exceptions, you can just make an empty class as shown above
• If you are storing special attributes in the exception, there are additional details (consult a reference)
28
165
Copyright (C) 2008, http://www.dabeaz.com 5-
Method Invocation
• Invoking a method is a two-step process
• Lookup: The . operator
• Method call: The () operator
class Foo(object): def bar(self,x): ...
>>> f = Foo()>>> b = f.bar>>> b<bound method Foo.bar of <__main__.Foo object at 0x590d0>>>>> b(2)
29
Lookup
Method call
Copyright (C) 2008, http://www.dabeaz.com 5-
Unbound Methods
• Methods can be accessed directly through the class
class Foo(object): def bar(self,a): ...>>> Foo.bar<unbound method Foo.bar>
• To use it, you just have to supply an instance
>>> f = Foo()>>> Foo.bar(f,2)
30
166
Copyright (C) 2008, http://www.dabeaz.com 5-
Attribute Accesss• These functions may be used to manipulate
attributes given an attribute name stringgetattr(obj,"name") # Same as obj.namesetattr(obj,"name",value) # Same as obj.name = valuedelattr(obj,"name") # Same as del obj.namehasattr(obj,"name") # Tests if attribute exists
• Example: Probing for an optional attributeif hasattr(obj,"x"): x = getattr(obj,"x"):else: x = None
• Note: getattr() has a useful default value arg
x = getattr(obj,"x",None)
31
Copyright (C) 2008, http://www.dabeaz.com 5-
Summary
• A high-level overview of classes
• How to define classes
• How to define methods
• Creating and using objects
• Special python methods
• Overloading
32
167
Copyright (C) 2008, http://www.dabeaz.com 5-
Exercise 5.3
33
Time : 15 Minutes
168
Inside the Python Object Model
Section 6
Copyright (C) 2008, http://www.dabeaz.com 6-
Overview
• A few more details about how objects work
• How objects are represented
• Details of attribute access
• Data encapsulation
• Memory management
2
169
Copyright (C) 2008, http://www.dabeaz.com 6-
Dictionaries Revisited
• A dictionary is a collection of named values
3
stock = {
'name' : 'GOOG',
'shares' : 100,
'price' : 490.10
}
• Dictionaries are commonly used for simple data structures (shown above)
• However, they are used for critical parts of the interpreter and may be the most important type of data in Python
Copyright (C) 2008, http://www.dabeaz.com 6-
Dicts and Functions• When a function executes, a dictionary
holds all of the local variables
4
def read_prices(filename):
prices = { }
for line in open(filename):
fields = line.split()
prices[fields[0]] = float(fields[1])
return prices
{
'filename' : 'prices.dat',
'line' : 'GOOG 523.12',
'prices' : { ... },
'fields' : ['GOOG', '523.12']
}
locals()
170
Copyright (C) 2008, http://www.dabeaz.com 6-
Dicts and Modules• In a module, a dictionary holds all of the
global variables and functions
5
# foo.py
x = 42
def bar():
...
def spam():
...
{
'x' : 42,
'bar' : <function bar>,
'spam' : <function spam>
}
foo.__dict__ or globals()
Copyright (C) 2008, http://www.dabeaz.com 6-
Dicts and Objects
• User-defined objects also use dictionaries
• Instance data
• Class members
• In fact, the entire object system is mostly just an extra layer that's put on top of dictionaries
• Let's take a look...
6
171
Copyright (C) 2008, http://www.dabeaz.com 6-
Dicts and Instances
• A dictionary holds instance data (__dict__)>>> s = Stock('GOOG',100,490.10)
>>> s.__dict__
{'name' : 'GOOG','shares' : 100, 'price': 490.10 }
• You populate this dict when assigning to selfclass Stock(object):
def __init__(self,name,shares,price):
self.name = name
self.shares = shares
self.price = price
self.__dict__
{
'name' : 'GOOG',
'shares' : 100,
'price' : 490.10
}
instance data 7
Copyright (C) 2008, http://www.dabeaz.com 6-
Dicts and Instances
• Critical point : Each instance gets its own private dictionary
s = Stock('GOOG',100,490.10)
t = Stock('AAPL',50,123.45)
8
{
'name' : 'GOOG',
'shares' : 100,
'price' : 490.10
}
{
'name' : 'AAPL',
'shares' : 50,
'price' : 123.45
}
• So, if you created 100 instances of some class, there are 100 dictionaries sitting around holding data
172
Copyright (C) 2008, http://www.dabeaz.com 6-
Dicts and Classes• A dictionary holds the members of a class
class Stock(object):
def __init__(self,name,shares,price):
self.name = name
self.shares = shares
self.price = price
def cost(self):
return self.shares*self.price
def sell(self,nshares):
self.shares -= nshares
9
Stock.__dict__
{
'cost' : <function>,
'sell' : <function>,
'__init__' : <function>,
}
methods
Copyright (C) 2008, http://www.dabeaz.com 6-
Instances and Classes
• Instances and classes are linked together
>>> s = Stock('GOOG',100,490.10)
>>> s.__dict__
{'name':'GOOG','shares':100,'price':490.10 }
>>> s.__class__
<class '__main__.Stock'>
>>>
• __class__ attribute refers back to the class
10
• The instance dictionary holds data unique to each instance whereas the class dictionary holds data collectively shared by all instances
173
Copyright (C) 2008, http://www.dabeaz.com 6-
Instances and Classes
.__dict__ {attrs}
.__class__
.__dict__ {attrs}
.__class__
.__dict__ {attrs}
.__class__
.__dict__ {methods}
instances
class
11
Copyright (C) 2008, http://www.dabeaz.com 6-
Attribute Access
• When you work with objects, you access data and methods using the (.) operator
12
• These operations are directly tied to the dictionaries sitting underneath the covers
x = obj.name # Getting
obj.name = value # Setting
del obj.name # Deleting
174
Copyright (C) 2008, http://www.dabeaz.com 6-
Modifying Instances
• Operations that modify an object always update the underlying dictionary
>>> s = Stock('GOOG',100,490.10)
>>> s.__dict__
{'name':'GOOG', 'shares':100, 'price':490.10 }
>>> s.shares = 50
>>> s.date = "6/7/2007"
>>> s.__dict__
{ 'name':'GOOG', 'shares':50, 'price':490.10,
'date':'6/7/2007'}
>>> del s.shares
>>> s.__dict__
{ 'name':'GOOG', 'price':490.10, 'date':'6/7/2007'}
>>>
13
Copyright (C) 2008, http://www.dabeaz.com 6-
A Caution• Setting an instance attribute will silently
override class members with the same name
>>> s = Stock('GOOG',100,490.10)
>>> s.cost()
49010.0
>>> s.cost = "a lot"
>>> s.cost()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: 'str' object is not callable
>>> s.cost
'a lot'
>>>
• The value in the instance dictionary is hiding the one in the class dictionary
14
175
Copyright (C) 2008, http://www.dabeaz.com 6-
Modifying Instances
• It may be surprising that instances can be extended after creation
• You can freely change attributes at any time
>>> s = Stock('GOOG',100,490.10)
>>> s.blah = "some new attribute"
>>> del s.name
>>>
• Again, you're just manipulating a dictionary
• Very different from C++/Java where the structure of an object is rigidly fixed
15
Copyright (C) 2008, http://www.dabeaz.com 6-
Reading Attributes
• Attribute may exist in two places
• Local instance dictionary
• Class dictionary
• So, both dictionaries may be checked
16
• Suppose you read an attribute on an instance
x = obj.name
176
Copyright (C) 2008, http://www.dabeaz.com 6-
Reading Attributes
• First check in local __dict__
• If not found, look in __dict__ of class
>>> s = Stock(...)
>>> s.name
'GOOG'
>>> s.cost()
49010.0
>>>
s .__dict__
.__class__{'name': 'GOOG',
'shares': 100 }
Stock.__dict__ {'cost': <func>,
'sell':<func>,
'__init__':..}
1
2
17
• This lookup scheme is how the members of a class get shared by all instances
Copyright (C) 2008, http://www.dabeaz.com 6-
Exercise 6.1
18
Time : 10 Minutes
177
Copyright (C) 2008, http://www.dabeaz.com 6-
How Inheritance Works
class A(B,C):
...
• Classes may inherit from other classes
• Bases are stored as a tuple in each class>>> A.__bases__
(<class '__main__.B'>,<class '__main__.C'>)
>>>
19
• This provides a link to parent classes
• This link simply extends the search process used to find attributes
Copyright (C) 2008, http://www.dabeaz.com 6-
Reading Attributes
• First check in local __dict__
• If not found, look in __dict__ of class
>>> s = Stock(...)
>>> s.name
'GOOG'
>>> s.cost()
49010.0
>>>
s .__dict__
.__class__{'name': 'GOOG',
'shares': 100 }
Stock.__dict__ {'cost': <func>,
'sell':<func>,
'__init__':..}
• If not found in class, look in base classes
.__bases__
look in __bases__
1
2
3
20
178
Copyright (C) 2008, http://www.dabeaz.com 6-
Single Inheritance• In inheritance hierarchies, attributes are
found by walking up the inheritance tree
21
class A(object): pass
class B(A): pass
class C(A): pass
class D(B): pass
class E(D): pass
object
A
B C
D
E
e = E()
e.attre instance
• With single inheritance, there is a single path to the top
• You stop with the first match
Copyright (C) 2008, http://www.dabeaz.com 6-
Multiple Inheritance
class A(object): pass
class B(object): pass
class C(A,B): pass
class D(B): pass
class E(C,D): pass
• Consider this hierarchyobject
A B
C D
E• What happens here?e = E()
e.attr
22
• A similar search process is carried out, but there is an added complication in that there may be many possible search paths
179
Copyright (C) 2008, http://www.dabeaz.com 6-
Multiple Inheritance• For multiple inheritance, Python determines
a "method resolution order" that sets the order in which base classes get checked
>>> E.__mro__
(<class '__main__.E'>, <class '__main__.C'>,
<class '__main__.A'>, <class '__main__.D'>,
<class '__main__.B'>, <type 'object'>)
>>>
• The MRO is described in a 20 page math paper (C3 Linearization Algorithm)
• Note: This complexity is one reason why multiple inheritance is often avoided
23
Copyright (C) 2008, http://www.dabeaz.com 6-
Classes and Encapsulation
• One of the primary roles of a class is to encapsulate data and internal implementation details of an object
• However, a class also defines a "public" interface that the outside world is supposed to use to manipulate the object
• This distinction between implementation details and the public interface is important
24
180
Copyright (C) 2008, http://www.dabeaz.com 6-
A Problem• In Python, almost everything about classes
and objects is "open"
• You can easily inspect object internals
• You can change things at will
• There's no strong notion of access-control (i.e., private class members)
• If you're trying to cleanly separate the internal "implementation" from the "interface" this becomes an issue
25
Copyright (C) 2008, http://www.dabeaz.com 6-
Python Encapsulation
• Python relies on programming conventions to indicate the intended use of something
• Typically, this is based on naming
• There is a general attitude that it is up to the programmer to observe the rules as opposed to having the language enforce rules
26
181
Copyright (C) 2008, http://www.dabeaz.com 6-
Private Attributes
• Any attribute name with leading __ is "private"
class Foo(object):
def __init__(self):
self.__x = 0
• Example>>> f = Foo()
>>> f.__x
AttributeError: 'Foo' object has no attribute '__x'
>>>
• This is actually just a name mangling trick>>> f = Foo()
>>> f._Foo__x
0
>>>
27
Copyright (C) 2008, http://www.dabeaz.com 6-
Private Methods
• Private naming also applies to methods
28
class Foo(object):
def __spam(self):
print "Foo.__spam"
def callspam(self):
self.__spam() # Uses Foo.__spam
• Example:>>> f = Foo()
>>> f.callspam()
Foo.__spam
>>> f.__spam()
AttributeError: 'Foo' object has no attribute '__spam'
>>> f._Foo__spam()
Foo.__spam
>>>
182
Copyright (C) 2008, http://www.dabeaz.com 6-
Performance Commentary
• Python name mangling in classes occurs at the time a class is defined, not at run time
• Thus, there is no performance overhead for using this feature
• It's actually a somewhat ingenious approach since it avoids all runtime checks for "public" vs. "private" (so the interpreter runs faster overall)
29
Copyright (C) 2008, http://www.dabeaz.com 6-
Accessor Methods• Consider controlling access to internals
using method calls.
30
class Foo(object):
def __init__(self,name):
self.__name = name
def getName(self):
return self.__name
def setName(self,name):
if not isinstance(name,str):
raise TypeError("Expected a string")
self.__name = name
• Methods give you more flexibility and may become useful as your program grows
• May make it easier to integrate the object into a larger framework of objects
183
Copyright (C) 2008, http://www.dabeaz.com 6-
Properties• Accessor methods can optionally be turned
into "property" attributes:
31
class Foo(object):
def __init__(self,name):
self.__name = name
def getName(self):
return self.__name
def setName(self,name):
if not isinstance(name,str):
raise TypeError("Expected a string")
self.__name = name
name = property(getName,setName)
• Properties look like normal attributes, but implicitly call the accessor methods.
Copyright (C) 2008, http://www.dabeaz.com 6-
Properties• Example use:
>>> f = Foo("Elwood")
>>> f.name # Calls f.getName()
'Elwood'
>>> f.name = 'Jake' # Calls f.setName('Jake')
>>> f.name = 45 # Calls f.setName(45)
TypeError: Expected a string
>>>
32
• Comment : With properties, you can hide extra processing behind access to data attributes--something that is useful in certain settings
• Example : Type checking
184
Copyright (C) 2008, http://www.dabeaz.com 6-
Properties• Properties are also useful if you are creating
objects where you want to have a very consistent user interface
• Example : Computed data attributes
33
class Circle(object):
def __init__(self,radius):
self.radius = name
def area(self):
return math.pi*self.radius**2
area = property(area)
def perimeter(self):
return 2*math.pi*self.radius
perimeter = property(perimeter)
Copyright (C) 2008, http://www.dabeaz.com 6-
Properties
• Example use:
34
>>> c = Circle(4)
>>> c.radius
4
>>> c.area
50.26548245743669
>>> c.perimeter
25.132741228718345
• Commentary : Notice how there is no obvious difference between the attributes as seen by the user of the object
Instance Variable
Computed Properties
185
Copyright (C) 2008, http://www.dabeaz.com 6-
Uniform Access
• The last example shows how to put a more uniform interface on an object. If you don't do this, an object might be confusing to use:
35
>>> c = Circle(4.0)
>>> a = c.area() # Method
>>> r = c.radius # Data attribute
>>>
• Why is the () required for the area, but not for the radius?
Copyright (C) 2008, http://www.dabeaz.com 6-
Properties• It is very common for a property to replace
a method of the same name
36
class Circle(object):
...
def area(self):
return math.pi*self.radius**2
area = property(area)
def perimeter(self):
return 2*math.pi*self.radius
perimeter = property(perimeter)
Notice the identical names
• When you define a property, you usually don't want the user to explicitly call the method
• So, this trick hides the method and only exposes the property attribute
186
Copyright (C) 2008, http://www.dabeaz.com 6-
Decorators• There is an alternative way to make properties
37
class Circle(object):
...
def area(self):
return math.pi*self.radius**2
area = property(area)
...
• Use a "decorator"class Circle(object):
...
@property
def area(self):
return math.pi*self.radius**2
...
• An advanced topic, but a "decorator" is basically a modifier applied to functions
Copyright (C) 2008, http://www.dabeaz.com 6-
__slots__ Attribute
• You can restrict the set of attribute namesclass Foo(object):
__slots__ = ['x','y']
...
• Produces errors for other attributes>>> f = Foo()
>>> f.x = 3
>>> f.y = 20
>>> f.z = 1
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'Foo' object has no attribute 'z'
• Prevents errors, restricts usage of objects
38
187
Copyright (C) 2008, http://www.dabeaz.com 6-
Exercise 6.2
39
Time : 10 Minutes
Copyright (C) 2008, http://www.dabeaz.com 6-
Memory Management
• Python has automatic memory management and garbage collection
• As a general rule, you do not have to worry about cleaning up when you define your own classes
• However, there are a couple of additional details to be aware of
40
188
Copyright (C) 2008, http://www.dabeaz.com 6-
Instance Creation
• Instances are actually created in two steps
>>> a = Stock('GOOG',100,490.10)
• __new__() constructs the raw instance
• __init__() initializes it
a = Stock.__new__(Stock,'GOOG',100,490.10)
a.__init__('GOOG',100,490.10)
• Performs these steps
41
Copyright (C) 2008, http://www.dabeaz.com 6-
__new__() method
• Creates a new (empty) object
• If you see this defined in a class, it almost always indicates the presence of heavy wizardry
• Inheritance from an immutable type
• Definition of a metaclass
• Both topics are beyond the scope of this class
• Bottom line : You don't see this in "normal" code
42
189
Copyright (C) 2008, http://www.dabeaz.com 6-
Instance Deletion
• All objects are reference counted
• Destroyed when refcount=0
• When an instance is destroyed, the reference count of all instance data is also decreased
• Normally, all of this just "works" and you don't worry about it
43
Copyright (C) 2008, http://www.dabeaz.com 6-
Object Deletion: Cycles
• There are some issues with cyclesclass A(object): pass
class B(object): pass
a = A()
b = B()
a.b = b
b.a = a
a b
ref=2 ref=2
.b .a
ref=1 ref=1
.b .a• Objects live, but no way to access
• Deletiondel a
del b
44
190
Copyright (C) 2008, http://www.dabeaz.com 6-
Garbage Collection
• An extra garbage collector looks for cycles
• It runs automatically, but you can control it
import gc
gc.collect() # Run full garbage collection now
gc.disable() # Disable garbage collection
gc.enable() # Enable garbage collection
• gc also has some diagnostic functions
45
Copyright (C) 2008, http://www.dabeaz.com 6-
__del__ method
• Classes may define a destructor methodclass Stock(object):
...
def __del__(self):
# Cleanup
• Called when the reference count reaches 0
• A common confusion : __del__() is not necessarily triggered by the del operator.
46
s = Stock('GOOG',100,490.10)
t = s
del s # Does not call s.__del__()
t = 42 # Calls s.__del__() (refcount=0)
191
Copyright (C) 2008, http://www.dabeaz.com 6-
__del__ method
• Don't define __del__ without a good reason
• Typical uses:
• Proper shutdown of system resources (e.g., network connections)
• Releasing locks (e.g., threading)
• Avoid defining it for any other purpose
47
Copyright (C) 2008, http://www.dabeaz.com 6-
Exercise 6.3
48
Optional
192
Documentation, Testing, and Debugging
Section 7
Copyright (C) 2008, http://www.dabeaz.com 7-
Overview
• Documenting programs
• Testing (doctest, unittest)
• Error handling/diagnostics
• Debugging
• Profiling
2
193
Copyright (C) 2008, http://www.dabeaz.com 7-
Documentation
• Python has both comments and docstrings
# Compute the greatest common divisor. Uses clever
# tuple hack to avoid an extra temporary variable.
def gcd(x,y):
"""Compute the greatest common divisor of x and y.
For example:
>>> gcd(40,16)
8
>>>
"""
while x > 0: x,y = y%x,x
return y
3
Copyright (C) 2008, http://www.dabeaz.com 7-
Documentation
• Comments should be used to provide notes to developers reading the source code
• Doc strings should be used to provide documentation to end-users.
• Emphasize: Documentation strings are considered good Python style.
4
194
Copyright (C) 2008, http://www.dabeaz.com 7-
Uses of Docstrings
• Most Python IDEs use doc strings to provide user information and help
5
Copyright (C) 2008, http://www.dabeaz.com 7-
Uses of Docstrings
• Online help
6
195
Copyright (C) 2008, http://www.dabeaz.com 7-
Testing: doctest module• A module that runs tests from docstrings
• Look at docstrings for interactive sessions
# Compute the greatest common divisor. Uses clever
# tuple hack to avoid an extra temporary variable.
def gcd(x,y):
"""Compute the greatest common divisor of x and y.
For example:
>>> gcd(40,16)
8
>>>
"""
while x > 0: x,y = y%x,x
return y
7
Copyright (C) 2008, http://www.dabeaz.com 7-
Using doctest
• Create a separate file that loads the module# testgcd.py
import gcd
import doctest
• Add the following code to test a module
doctest.testmod(gcd)
• To run the tests, do this% python testgcd.py
%
• If successful, will get no output
8
196
Copyright (C) 2008, http://www.dabeaz.com 7-
Using doctest• Test failures produce a report
% python testgcd.py
********************************************************
File "/Users/beazley/pyfiles/gcd.py", line 7, in gcd.gcd
Failed example:
gcd(40,16)
Expected:
8
Got:
7
********************************************************
1 items had failures:
1 of 1 in gcd.gcd
***Test Failed*** 1 failures.
%
9
Copyright (C) 2008, http://www.dabeaz.com 7-
Self-testing
• Modules may be set up to test themselves# gcd.py
def gcd(x,y):
""" ...
"""
if __name__ == '__main__':
import doctest
doctest.testmod()
• Will run tests if executed as main program% python gcd.py
10
197
Copyright (C) 2008, http://www.dabeaz.com 7-
Exercise 7.1
11
Time : 10 Minutes
Copyright (C) 2008, http://www.dabeaz.com 7-
Testing: unittest
• unittest module
• A more formal testing framework
• Class-based
• Better suited for exhaustive testing
• Can deal with more complex test cases
12
198
Copyright (C) 2008, http://www.dabeaz.com 7-
Using unittest
• First, you create a separate file# testgcd.py
import gcd
import unittest
class TestGCDFunction(unittest.TestCase):
...
• Then you define a testing class
• Must inherit from unittest.TestCase
13
Copyright (C) 2008, http://www.dabeaz.com 7-
Using unittest
• Define testing methodsclass TestGCDFunction(unittest.TestCase):
def testsimple(self):
# Test with simple integer arguments
g = gcd.gcd(40,16)
self.assertEqual(g,8)
def testfloat(self):
# Test with floats. Should be error
self.assertRaises(TypeError,gcd.gcd,3.5,4.2)
def testlong(self):
# Test with long integers
g = gcd.gcd(23748928388L, 6723884L)
self.assertEqual(g,4L)
self.assert_(type(g) is long)
• Each method must start with "test..."
14
199
Copyright (C) 2008, http://www.dabeaz.com 7-
Using unittest• Each test works through assertions
# Assert that expr is True
self.assert_(expr)
# Assert that x == y
self.assertEqual(x,y)
# Assert that x != y
self.assertNotEqual(x,y) # Assert x != y
# Assert that callable(*args,**kwargs) raises a given
# exception
self.assertRaises(exc,callable,*args,**kwargs)
• There are several other tests, but these are the main ones
15
Copyright (C) 2008, http://www.dabeaz.com 7-
Running unittests• To run tests, you add the following code
# testgcd.py
class TestGCDFunction(unittest.testcase):
...
unittest.main()
• Then run Python on the test file
% python testgcd.py
...
---------------------------------------------------
Ran 3 tests in 0.000s
OK
%
16
200
Copyright (C) 2008, http://www.dabeaz.com 7-
Setup and Teardown• Two additional methods can be defined
# testgcd.py
class TestGCDFunction(unittest.testcase):
def setUp(self):
# Perform setup before running a test
...
def tearDown(self):
# Perform cleanup after running a test
...
• Can be used to set up environment prior to running a test
• Called before and after each test method
17
Copyright (C) 2008, http://www.dabeaz.com 7-
unittest comments
• There is an art to effective unit testing
• Can grow to be quite complicated for large applications
• The unittest module has a huge number of options related to test runners, collection of results, and other aspects of testing (consult documentation for details)
18
201
Copyright (C) 2008, http://www.dabeaz.com 7-
Exercise 7.2
19
Time : 15 Minutes
Copyright (C) 2008, http://www.dabeaz.com 7-
Assertions
• assert statement
assert expr [, "diagnostic message" ]
• If expression is not true, raises AssertionError exception
• Should not be used to check user-input
• Use for internal program checking
20
202
Copyright (C) 2008, http://www.dabeaz.com 7-
Contract Programming• Consider assertions on all inputs and outputs
21
def gcd(x,y):
assert x > 0 # Pre-assertions
assert y > 0
while x > 0: x,y = y%x,x
assert y > 0 # Post-assertion
return y
• Checking inputs will immediately catch callers who aren't using appropriate arguments
• Checking outputs may catch buggy algorithms
• Both approaches prevent errors from propagating to other parts of the program
Copyright (C) 2008, http://www.dabeaz.com 7-
Optimized mode
• Python has an optimized run mode
python -O foo.py
• This strips all assert statements
• Allows debug/release mode development
• Normal mode for full debugging
• Optimized mode for faster production runs
22
203
Copyright (C) 2008, http://www.dabeaz.com 7-
__debug__ variable
• Global variable checked for debugging
if __debug__:
# Perform some kind of debugging code
...
• By default, __debug__ is True
• Set False in optimized mode (python -O)
• The implementation is efficient. The if statement is stripped in both cases and in -O mode, the debugging code is stripped entirely.
23
Copyright (C) 2008, http://www.dabeaz.com 7-
How to Handle Errors
• Uncaught exceptions% python blah.py
(most recent call last):
File "blah.py", line 13, in ?
foo()
File "blah.py", line 10, in foo
bar()
File "blah.py", line 7, in bar
spam()
File "blah.py", line 4, in spam
x.append(3)
AttributeError: 'int' object has no attribute 'append'
• Program prints traceback, exits
24
204
Copyright (C) 2008, http://www.dabeaz.com 7-
Creating Tracebacks
• How to create a traceback yourselfimport traceback
try:
...
except:
traceback.print_exc()
• Sending a traceback to a file
25
traceback.print_exc(file=f)
• Getting traceback as a string
err = traceback.format_exc()
Copyright (C) 2008, http://www.dabeaz.com 7-
Error Handling• Keeping Python alive upon termination% python -i blah.py
Traceback (most recent call last):
File "blah.py", line 13, in ?
foo()
File "blah.py", line 10, in foo
bar()
File "blah.py", line 7, in bar
spam()
File "blah.py", line 4, in spam
x.append(3)
AttributeError: 'int' object has no attribute 'append'
>>>
• Python enters normal interactive mode
• Can use to examine global data, objects, etc.
26
205
Copyright (C) 2008, http://www.dabeaz.com 7-
The Python Debugger• pdb module
• Entering the debugger after a crash% python -i blah.py
Traceback (most recent call last):
File "blah.py", line 13, in ?
foo()
File "blah.py", line 10, in foo
bar()
File "blah.py", line 7, in bar
spam()
File "blah.py", line 4, in spam
x.append(3)
AttributeError: 'int' object has no attribute 'append'
>>> import pdb
>>> pdb.pm()
> /Users/beazley/python/blah.py(4)spam()
-> x.append(3)
(Pdb)
27
Copyright (C) 2008, http://www.dabeaz.com 7-
The Python Debugger
• Launching the debugger inside a programimport pdb
def some_function():
statements
...
pdb.set_trace() # Enter the debugger
...
statements
28
• This starts the debugger at the point of the set_trace() call
206
Copyright (C) 2008, http://www.dabeaz.com 7-
Python Debugger• Common debugger commands
(Pdb) help # Get help
(Pdb) w(here) # Print stack trace
(Pdb) d(own) # Move down one stack level
(Pdb) u(p) # Move up one stack level
(Pdb) b(reak) loc # Set a breakpoint
(Pdb) s(tep) # Execute one instruction
(Pdb) c(ontinue) # Continue execution
(Pdb) l(ist) # List source code
(Pdb) a(rgs) # Print args of current function
(Pdb) !statement # Execute statement
• For breakpoints, location is one of(Pdb) b 45 # Line 45 in current file
(Pdb) b file.py:45 # Line 34 in file.py
(Pdb) b foo # Function foo() in current file
(Pdb) b module.foo # Function foo() in a module
29
Copyright (C) 2008, http://www.dabeaz.com 7-
Debugging Example• Obtaining a stack trace
(Pdb) w
/Users/beazley/Teaching/python/blah.py(13)?()
-> foo()
/Users/beazley/Teaching/python/blah.py(10)foo()
-> bar()
/Users/beazley/Teaching/python/blah.py(7)bar()
-> spam()
> /Users/beazley/Teaching/python/blah.py(4)spam()
-> x.append(3)
(Pdb)
• Examing a variable(Pdb) print x
-1
(Pdb)
30
207
Copyright (C) 2008, http://www.dabeaz.com 7-
Debugging Example• Getting a source listing
(Pdb) list
1 # Compute the greatest common divisor. Uses clever
2 # tuple hack to avoid an extra temporary variable
3 def gcd(x,y):
4 -> while x > 0: x,y = y%x,x
5 return y
[EOF]
(Pdb)
• Setting a breakpoint(Pdb) b 5
Breakpoint 1 at /Users/beazley/Teaching/gcd.py:5
(Pdb)
• Running until break or completion(Pdb) cont
> /Users/beazley/Teaching/NerdRanch/Testing/gcd.py(5)gcd()
-> return y
31
Copyright (C) 2008, http://www.dabeaz.com 7-
Debugging Example• Debugging a function call
>>> import gcd
>>> import pdb
>>> pdb.runcall(gcd.gcd,42,14)
-> while x > 0: x,y = y%x,x
(Pdb)
• Single stepping(Pdb) print x,y
42 14
(Pdb) s
> /Users/beazley/Teaching/python/gcd.py(4)gcd()
-> while x > 0: x,y = y%x,x
(Pdb) print x,y
14 42
(Pdb) s
32
208
Copyright (C) 2008, http://www.dabeaz.com 7-
Python Debugger
• Running entire program under debugger
% python -m pdb someprogram.py
• Automatically enters the debugger before the first statement (allowing you to set breakpoints and change the configuration)
33
Copyright (C) 2008, http://www.dabeaz.com 7-
Profiling
• profile module
• Collects performance statistics and prints a reportimport profile
profile.run("command")
• Uses exec to execute the given command
• A command line alternative% python -m profile someprogram.py
34
209
Copyright (C) 2008, http://www.dabeaz.com 7-
Profile Sample Output% python -m profile cparse.py
447981 function calls (446195 primitive calls) in
5.640 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall
filename:lineno(function)
2 0.000 0.000 0.000 0.000 :0(StringIO)
101599 0.470 0.000 0.470 0.000 :0(append)
56 0.000 0.000 0.000 0.000 :0(callable)
4 0.000 0.000 0.000 0.000 :0(close)
1028 0.010 0.000 0.010 0.000 :0(cmp)
4 0.000 0.000 0.000 0.000 :0(compile)
1 0.000 0.000 0.000 0.000 :0(digest)
2 0.000 0.000 0.000 0.000 :0(exc_info)
1 0.000 0.000 5.640 5.640 :0(execfile)
4 0.000 0.000 0.000 0.000 :0(extend)
50 0.000 0.000 0.000 0.000 :0(find)
83102 0.430 0.000 0.430 0.000 :0(get)
...
35
Copyright (C) 2008, http://www.dabeaz.com 7-
Summary
• Documentation strings
• Testing with doctest
• Testing with unittest
• Debugging (pdb)
• Profiling
36
210
Copyright (C) 2008, http://www.dabeaz.com 7-
Exercise 7.3
37
Time : 10 Minutes
211
Iterators and Generators Section 8
Copyright (C) 2008, http://www.dabeaz.com 8-
Overview
• Iteration protocol and iterators
• Generator functions
• Generator expressions
2
212
Copyright (C) 2008, http://www.dabeaz.com 8-
Iteration
• A simple definition: Looping over items
a = [2,4,10,37,62]
# Iterate over a
for x in a:
...
• A very common pattern
• loops, list comprehensions, etc.
• Most programs do a huge amount of iteration
3
Copyright (C) 2008, http://www.dabeaz.com 8-
Iteration: Protocol• An inside look at the for statement
for x in obj:
# statements
• Underneath the covers_iter = obj.__iter__() # Get iterator object
while 1:
try:
x = _iter.next() # Get next item
except StopIteration: # No more items
break
# statements
...
• Any object that supports __iter__() and next() is said to be "iterable."
4
213
Copyright (C) 2008, http://www.dabeaz.com 8-
Iteration: Protocol
• Manual iteration over a list>>> x = [1,2,3]
>>> it = x.__iter__()
>>> it
<listiterator object at 0x590b0>
>>> it.next()
1
>>> it.next()
2
>>> it.next()
3
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
5
Copyright (C) 2008, http://www.dabeaz.com 8-
Iterators Everywhere
• Most built in types support iterationa = "hello"
for c in a: # Loop over characters in a
...
b = { 'name': 'Dave', 'password':'foo'}
for k in b: # Loop over keys in dictionary
...
c = [1,2,3,4]
for i in c: # Loop over items in a list/tuple
...
f = open("foo.txt")
for x in f: # Loop over lines in a file
...
6
214
Copyright (C) 2008, http://www.dabeaz.com 8-
Exercise 8.1
7
Time : 5 Minutes
Copyright (C) 2008, http://www.dabeaz.com 8-
Supporting Iteration
• User-defined objects can support iteration
• Example: Counting down...>>> for x in countdown(10):
... print x,
...
10 9 8 7 6 5 4 3 2 1
>>>
8
• To make this work with your own object, you just have to provide a few methods
215
Copyright (C) 2008, http://www.dabeaz.com 8-
Supporting Iteration• Sample implementation
class countdown(object):
def __init__(self,start):
self.start = start
def __iter__(self):
return countdown_iter(self.start)
class countdown_iter(object):
def __init__(self,start):
self.count = start
def next(self):
if self.count <= 0:
raise StopIteration
r = self.count
self.count -= 1
return r
9
Copyright (C) 2008, http://www.dabeaz.com 8-
class countdown(object):
def __init__(self,start):
self.start = start
def __iter__(self):
return countdown_iter(self.start)
class countdown_iter(object):
def __init__(self,start):
self.count = start
def next(self):
if self.count <= 0:
raise StopIteration
r = self.count
self.count -= 1
return r
Supporting Iteration• Object that defines an iterator
Must define __iter__ which
creates an iterator object
10
216
Copyright (C) 2008, http://www.dabeaz.com 8-
class countdown(object):
def __init__(self,start):
self.start = start
def __iter__(self):
return countdown_iter(self.start)
class countdown_iter(object):
def __init__(self,start):
self.count = start
def next(self):
if self.count <= 0:
raise StopIteration
r = self.count
self.count -= 1
return r
Supporting Iteration• Object that defines an iterator
Must define a class which is the iterator object
11
Copyright (C) 2008, http://www.dabeaz.com 8-
class countdown(object):
def __init__(self,start):
self.start = start
def __iter__(self):
return countdown_iter(self.start)
class countdown_iter(object):
def __init__(self,start):
self.count = start
def next(self):
if self.count <= 0:
raise StopIteration
r = self.count
self.count -= 1
return r
Supporting Iteration• Object that defines an iterator
Iterator must define next() to
return items
12
217
Copyright (C) 2008, http://www.dabeaz.com 8-
Iteration Example• Example use:
>>> c = countdown(5)
>>> for i in c:
... print i,
...
5 4 3 2 1
>>> for i in c:
... for j in c:
... print "(%d,%d)" % (i,j)
...
(5, 5)
(5, 4)
(5, 3)
...
(1, 2)
(1, 1)
>>>
13
Copyright (C) 2008, http://www.dabeaz.com 8-
Iteration Example• Why two classes?
• Iterator is object that operates on another object (usually)
• May be iterating in more than one place (nested loops)
countdown.start = 5
countdown_iter.count = 4
countdown_iter.count = 2
__iter__()
14
218
Copyright (C) 2008, http://www.dabeaz.com 8-
Exercise 8.2
15
Time : 15 Minutes
Copyright (C) 2008, http://www.dabeaz.com 8-
Defining Iterators
• Is there an easier way to do this?
• Iteration is extremely useful
• Annoying to have to define __iter__(), create iterator objects, and manage everything
16
219
Copyright (C) 2008, http://www.dabeaz.com 8-
Generators
• A function that defines an iterator
def countdown(n):
while n > 0:
yield n
n -= 1
>>> for i in countdown(5):
... print i,
...
5 4 3 2 1
>>>
• Any function that uses yield is a generator
17
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Functions• Behavior is totally different than normal func
• Calling a generator function creates an generator object. It does not start running the function.
def countdown(n):
print "Counting down from", n
while n > 0:
yield n
n -= 1
>>> x = countdown(10)
>>> x
<generator object at 0x58490>
>>>
Notice that no output was produced
18
220
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Functions
• Function only executes on next()>>> x = countdown(10)
>>> x
<generator object at 0x58490>
>>> x.next()
Counting down from 10
10
>>>
• yield produces a value, but suspends function
• Function resumes on next call to next()>>> x.next()
9
>>> x.next()
8
>>>
Function starts executing here
19
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Functions
• When the generator returns, iteration stops>>> x.next()
1
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
20
221
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Example• Follow a file, yielding new lines as added
def follow(f)
f.seek(0,2) # Seek to end of a file
while True:
line = f.readline()
if not line:
time.sleep(0.1) # Sleep for a bit
continue # and try again
yield line
• Observe an active log file
21
f = open("access-log")
for line in follow(f):
# Process the line
print line
Copyright (C) 2008, http://www.dabeaz.com 8-
Generators vs. Iterators
• A generator function is slightly different than an object that supports iteration
• A generator is a one-time operation. You can iterate over the generated data once, but if you want to do it again, you have to call the generator function again.
• An object that supports iteration can be re-used over and over again (e.g., a list)
22
222
Copyright (C) 2008, http://www.dabeaz.com 8-
Exercise 8.3
23
Time : 30 minutes
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Expressions• A generator version of a list comprehension
>>> a = [1,2,3,4]
>>> b = (2*x for x in a)
>>> b
<generator object at 0x58760>
>>> for i in b: print i,
...
2 4 6 8
>>>
• Important differences
• Does not construct a list.
• Only useful purpose is iteration
• Once consumed, can't be reused
24
223
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Expressions• General syntax
(expression for i in s if conditional)
• Can also serve as a function argument
sum(x*x for x in a)
• Can be applied to any iterable
>>> a = [1,2,3,4]
>>> b = (x*x for x in a)
>>> c = (-x for x in b)
>>> for i in c: print i,
...
-1 -4 -9 -16
>>>
25
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Expressions
• Example: Sum a field in a large input file
f = open("datfile.txt")
# Strip all lines that start with a comment
lines = (line for line in f if not line.startswith('#'))
# Split the lines into fields
fields = (s.split() for s in lines)
# Sum up one of the fields
print sum(float(f[2]) for f in fields)
• Solution
26
823.1838823 233.128883 14.2883881 44.1787723
377.1772737 123.177277 143.288388 3884.78772
...
224
Copyright (C) 2008, http://www.dabeaz.com 8-
Generator Expressions
• Solution
27
• Each generator expression only evaluates data as needed (lazy evaluation)
• Example: Running above on a 6GB input file only consumes about 60K of RAM
f = open("datfile.txt")
# Strip all lines that start with a comment
lines = (line for line in f if not line.startswith('#'))
# Split the lines into fields
fields = (s.split() for s in lines)
# Sum up one of the fields
print sum(float(f[2]) for f in fields)
Copyright (C) 2008, http://www.dabeaz.com 8-
Exercise 8.4
28
Time : 15 Minutes
225
Copyright (C) 2008, http://www.dabeaz.com 8-
Why Use Generators?
• Many problems are much more clearly expressed in terms of iteration
• Looping over a collection of items and performing some kind of operation (searching, replacing, modifying, etc.)
• Generators allow you to create processing "pipelines"
29
Copyright (C) 2008, http://www.dabeaz.com 8-
Why Use Generators?
• Generators encourage code reuse
• Separate the "iteration" from code that uses the iteration.
• Means that various iteration patterns can be defined more generally.
30
226
Copyright (C) 2008, http://www.dabeaz.com 8-
Why Use Generators?
• Better memory efficiency
• "Lazy" evaluation
• Only produce values when needed
• Contrast to constructing a big list of values first
• Can operate on infinite data streams
31
Copyright (C) 2008, http://www.dabeaz.com 8-
The itertools Module
32
• A library module with various functions designed to help with iterators/generators
itertools.chain(s1,s2)
itertools.count(n)
itertools.cycle(s)
itertools.dropwhile(predicate, s)
itertools.groupby(s)
itertools.ifilter(predicate, s)
itertools.imap(function, s1, ... sN)
itertools.repeat(s, n)
itertools.tee(s, ncopies)
itertools.izip(s1, ... , sN)
• All functions process data iteratively.
• Implement various kinds of iteration patterns
227
Copyright (C) 2008, http://www.dabeaz.com 8-
More Information
33
http://www.dabeaz.com/generators
• "Generator Tricks for Systems Programmers" tutorial from PyCon'08
• More examples and more generator tricks
228
Working With TextSection 9
Copyright (C) 2008, http://www.dabeaz.com 9-
Overview
• This sections expands upon text processing
• Some common programming idioms
• Simple text parsing
• Regular expression pattern matching
• Text generation
• Text I/O
2
229
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Parsing
• Problem : Converting text to data
• This is an almost universal problem. Programs need to process various sorts of file formats, extract data from files, etc.
• Example : Reading column-oriented fields
• Example : Extracting data from XML/HTML
3
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Splitting• String Splitting
4
s.split([separator [, maxsplit]])s.rsplit([separator [, maxsplit]]
separator : Text separating elementsmaxsplit : Number of splits to perform
• Examples:
line = "foo/bar/spam"
line.split('/')
line.split('/',1)
line.rsplit('/',1)
['foo','bar','spam']
['foo','bar/spam']
['foo/bar','spam']
230
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Stripping
• The following methods string text from the beginning or end of a string
5
s.strip([chars]) # Strip begin/ends.lstrip([chars]) # Strip on lefts.rstrip([chars]) # Strip on right
• Examples:
s = "==Hello=="s.strip('=') "Hello"s.lstrip('=') "Hello=="s.rstrip('=') "==Hello"
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Searching
• Searching : s.find(text [,start])
6
s = "<html><head><title>Hello World</title><head>..."
title_start = s.find("<title>")if title_start >= 0: title_end = s.find("</title>",title_start)
• Returns index of starting character or -1 if text wasn't found.
• Use a slice to extract text fragments
title_text = s[title_start+7:title_end]
231
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Replacement
• Replacement : s.replace(text, new [, count])
7
s = "Is Chicago not Chicago?"
>>> s.replace('Chicago','Peoria')'Is Peoria not Peoria?'
>>> s.replace('Chicago','Peoria',1)'Is Peoria not Chicago?'
• Reminder: This always returns a new string
Copyright (C) 2008, http://www.dabeaz.com 9-
Commentary
• Simple text parsing problems can often be solved by just using various combinations of string splitting/stripping/finding operations
• These low level operations are implemented in C in the Python interpreter
• Depending on what is being parsed, this kind of approach may the fastest implementation
8
232
Copyright (C) 2008, http://www.dabeaz.com 9-
Exercise 9.1
9
Time : 15 Minutes
Copyright (C) 2008, http://www.dabeaz.com 9-
re Module
• Regular expression pattern matching
• Searching for text patterns
• Extracting text from a document
• Replacing text
• Example: Extracting URLs from text
10
Go to http://www.python.org for more information on Python
233
Copyright (C) 2008, http://www.dabeaz.com 9-
re Module
• A quick review of regular expressions"foo" # Matches the text "foo""(foo|bar)" # Matches the text "foo" or "bar""(foo)*" # Match 0 or more repetitions of foo"(foo)+" # Match 1 or more repetitions of foo"(foo)?" # Match 0 or 1 repetitions of foo"[abcde]" # Match one of the letters a,b,c,d,e"[a-z]" # Match one letter from a,b,...,z"[^a-z]" # Match any character except a,b,...z"." # Match any character except newline"\*" # Match the * character"\+" # Match the + character"\d" # Match a digit"\s" # Match whitespace"\w" # Match alphanumeric character
11
Copyright (C) 2008, http://www.dabeaz.com 9-
re Module
• Patterns supplied as strings
• Usually specified using raw strings
pat = r'(http://[\w-]+(?:\.[\w-]+)*(?:/[\w?#=&.,%_-]*)*)'
• Example
• Raw strings don't interpret escapes (\)
12
234
Copyright (C) 2008, http://www.dabeaz.com 9-
re Usage
• You start with some kind of pattern string
pat = r'<title>(.*?)</title>'
13
• You then compile the pattern string
patc = re.compile(pat,re.IGNORECASE)
• You then use the compiled pattern to perform various matching/searching operations
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Matching
• How to match a string against a patternm = patc.match(somestring)if m: # Matchedelse: # No match
• Example>>> m = patc.match("<title>Python Introduction</title>")>>> m<_sre.SRE_Match object at 0x5da68>>>> m = patc.match("<h1>Python Introduction</h1>")>>> m>>>
14
235
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Searching
• How to search for a pattern in a stringm = patc.search(somestring)if m: # Foundelse: # Not found
• Example
>>> m = patc.search("<body><title>Python Introduction</title>...")
>>> m<_sre.SRE_Match object at 0x5da68>>>>
15
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Match Objects• How to get the text that was matched
m = patc.search(s)if m: text = m.group()
• Example
first = m.start() # Index of pattern startlast = m.end() # Index of pattern endfirst,last = m.span() # Start,end indices together
• How to get the location of the text matched
>>> m = patc.search("<title>Python Introduction</title>")>>> m.group()'<title>Python Introduction</title>'>>> m.start()0>>> m.end()34>>>
16
236
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Groups
• Regular expressions may define groupspat = r'<title>(.*?)</title>'pat = r'([\w-]+):(.*)'
• Groups are assigned numbers
pat = r'<title>(.*?)</title>'pat = r'([\w-]+):(.*)'
1
1 2
• Number determined left-to-right
17
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Groups
• When matching, groups can be extracted
>>> m = patc.match("<title>Python Introduction</title>")>>> m.group()'<title>Python Introduction</title>'>>> m.group(1)'Python Introduction'>>>
18
• Used to easily extract text for various parts of a regex pattern
237
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Search Example
• Find all occurrences of a pattern
matches = patc.finditer(s)for m in matches: print m.group()
19
• This returns a list of match objects for later processing
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Pattern Replacement
• Find all patterns and replacedef make_link(m): url = m.group() return '<a href="%s">%s</a>' % (url,url)
t = patc.sub(make_link,s)
• Example:
20
>>> patc.sub(make_link,"Go to http://www.python.org")'Go to <a href="http://www.python.org">http://www.python.org</a>'>>>
238
Copyright (C) 2008, http://www.dabeaz.com 9-
re: Comments• re module is very powerful
• I have only covered the essential basics
• Strongly influenced by Perl
• However, regexs are not an operator
• Reference:
Jeffrey Friedl, "Mastering Regular Expressions", O'Reilly & Associates, 2006.
21
Copyright (C) 2008, http://www.dabeaz.com 9-
Exercise 9.2
22
Time : 15 Minutes
239
Copyright (C) 2008, http://www.dabeaz.com 9-
Generating Text
• Programs often need to generate text
• Reports
• HTML pages
• XML
• Endless possibilities
23
Copyright (C) 2008, http://www.dabeaz.com 9-
String Concatenation
• Strings can be concatenated using +
24
s = "Hello"t = "World"
a = s + t # a = "HelloWorld"
• Although (+) is fine for just a few strings, it has horrible performance if you are concatenating many small chunks together to create a large string
• Should not be used for generating output
240
Copyright (C) 2008, http://www.dabeaz.com 9-
String Joining
• The fastest way to join many strings
25
chunks = ["chunk1","chunk2",..."chunkN"]
result = separator.join(chunks)
• Example:
chunks = ["Is","Chicago","Not","Chicago?"]
" ".join(chunks) "Is Chicago Not Chicago?"",".join(chunks) "Is,Chicago,Not,Chicago?""".join(chunks) "IsChicagoNotChicago?"
Copyright (C) 2008, http://www.dabeaz.com 9-
String Joining Example
• Don't do this:
26
s = ""for x in seq: ... s += "some text being produced" ...
• Better:
chunks = []for x in seq: ... chunks.append("some text being produced") ...s = "".join(chunks)
241
Copyright (C) 2008, http://www.dabeaz.com 9-
Printing to a String
• StringIO module
• Provides a "file-like" object you can print to
27
import StringIO
out = StringIO.StringIO()for x in seq: ... print >>out, "some text being produced" ...s = out.getvalue()
• In certain situations, this object can be used in place of a file
Copyright (C) 2008, http://www.dabeaz.com 9-
String Interpolation
• In languages like Perl and Ruby, programmers are used to string interpolation features
28
$name = "Dave";$age = 39;
print "$name is $age years old\n";
• Python doesn't have a direct equivalent
• However, there are some alternatives
242
Copyright (C) 2008, http://www.dabeaz.com 9-
Dictionary Formatting
• String formatting against a dictionary
29
fields = { 'name' : 'Dave', 'age' : 39}
print "%(name)s is %(age)s years old\n" % fields
• The above requires a dictionary, but you can easily get a dictionary of variables
name = "Dave"age = 39
print "%(name)s is %(age)s years old\n" % vars()
Copyright (C) 2008, http://www.dabeaz.com 9-
Template Strings
• A special string that supports $substitutions
30
import strings = string.Template("$name is $age years old\n")fields = { 'name' : 'Dave', 'age' : 39}
print s.substitute(fields)
• An alternative method for missing values
s.safe_substitute(fields)
243
Copyright (C) 2008, http://www.dabeaz.com 9-
Advanced Formatting
• Python 2.6/3.0 have new string formatting
31
print "{0} is {1:d} years old".format("Dave",39)
print "{name} is {age:d} years old".format(name="Dave",age=39)
stock = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 }
print "{s[name]:10s} {s[shares]:10d} {s[price]:10.2f}"\ .format(s=stock)
• Allows much more flexible substitution and lookup than the % operator
Copyright (C) 2008, http://www.dabeaz.com 9-
Exercise 9.3
32
Time : 20 Minutes
244
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Input/Output
• You frequently read/write text from files
• Example: Reading line-by-line
33
f = open("something.txt","r")for line in f: ...
• Example: Writing a line of text
f = open("something.txt","w")f.write("Hello World\n")
print >>f, "Hello World\n"
• There are still a few issues to worry about
Copyright (C) 2008, http://www.dabeaz.com 9-
Line Handling
• Question: What is a text line?
• It's different on different operating systems
34
some characters .......\n (Unix)some characters .......\r\n (Windows)
• By default, Python uses the system's native line ending when writing text files
• However, it can get messy when reading text files (especially cross platform)
245
Copyright (C) 2008, http://www.dabeaz.com 9-
Line Handling
35
• Example: Reading a Windows text file on Unix
>>> f = open("test.txt","r")>>> f.readlines()['Hello\r\n', 'World\r\n']>>>
• Notice how the lines include the extra Windows '\r' character
• This is a potential source of problems for programs that only expect '\n' line endings
Copyright (C) 2008, http://www.dabeaz.com 9-
Universal Newline
36
• Python has a special "Universal Newline" mode>>> f = open("test.txt","U")>>> f.read()'Hello World\n'>>>
• Converts all endings to standard '\n' character
• f.newlines records the actual newline character that was used in the file
>>> f.newlines'\r\n'>>>
246
Copyright (C) 2008, http://www.dabeaz.com 9-
Universal Newline
37
• Example: Reading a Windows text file on Unix
>>> f = open("test.txt","r")>>> f.readlines()['Hello\r\n', 'World\r\n']
>>> f = open("test.txt","U")>>> f.readlines()['Hello\n', 'World\n']>>> f.newlines'\r\n'>>>
• Notice how non-native Windows newline '\r\n' is translated to standard '\n'
Copyright (C) 2008, http://www.dabeaz.com 9-
Text Encoding
38
• Question : What is a character?
• In Python 2, text consists of 8-bit characters
"Hello World" 48 65 6c 6c 6f 20 57 6f 72 6c 64
• Characters are usually encoded in ASCII
247
Copyright (C) 2008, http://www.dabeaz.com 9-
International Characters
39
• Problem : How to deal with characters from international character sets?
"That's a spicy Jalapeño!"
• Question: What is the character encoding?
• Historically, everyone made a different encoding
ñ = 0x96 (MacRoman)ñ = 0xf1 (CP1252 - Windows)ñ = 0xa4 (CP437 - DOS)
• Bloody hell!
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode
• For international characters, use Unicode
• In Python, there is a special syntax for literals
40
t = u"That's a spicy Jalape\u00f1o!"
Unicode
• Unicode strings are just like regular strings except that they hold Unicode characters
• What is a Unicode character?
248
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode Characters
• Unicode defines a standard numerical value for every character used in all languages (except for fictional ones such as Klingon)
• The numeric value is known as "code point"
• There are a lot of code points (>100,000)
41
ñ!
!
㌄
= U+00F1= U+03B5= U+0A87= U+3304
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode Charts• http://www.unicode.org/charts
42
249
Copyright (C) 2008, http://www.dabeaz.com 9-
Using Unicode Charts
43
t = u"That's a spicy Jalape\u00f1o!"
• \uxxxx - Embeds a Unicode code point in a string
• Code points specified in hex by convention
Copyright (C) 2008, http://www.dabeaz.com 9-
Using Unicode Charts
44
t = u"Spicy Jalape\N{LATIN SMALL LETTER N WITH TILDE}o!"
• \N{name} - Embeds a named character
• All code points also have descriptive names
250
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode Representation
• Internally, Unicode characters are 16-bits
45
t = u"Jalape\u00f1o"
004a 0061 006c 0061 0070 0065 00f1 006f
• Normally, you don't worry about this
• Except you have to perform I/O
u'J' --> 00 4a (Big Endian)u'J' --> 4a 00 (Little Endian)
• How do characters get encoded in the file?
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode I/O
• Unicode does not define a standard file encoding--it only defines character code values
• There are many different file encodings
46
• Examples: UTF-8, UTF-16, etc.
• Most popular: UTF-8 (ASCII is a subset)
• So, how do you deal with these encodings?
251
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode File I/O
• Unicode I/O handled using codecs module
• codecs.open(filename,mode,encoding)
47
>>> f = codecs.open("data.txt","w","utf-8")>>> f.write(u"Hello World\n")>>> f.close()
>>> f = codecs.open("data.txt","w","utf-16")>>> f.write(data)>>>
• Several hundred character codecs are provided
• Consult documentation for details
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode Encoding
• Explicit encoding via strings
48
>>> a = u"Jalape\u00f1o">>> enc_a = a.encode("utf-8")>>>
• Explicit decoding from bytes
>>> enc_a = 'Jalape\xc3\xb1o'>>> a = enc_a.decode("utf-8")>>> au'Jalape\xf1o'>>>
252
Copyright (C) 2008, http://www.dabeaz.com 9-
Encoding Errors
• Encoding/Decoding text is often messy
• May encounter broken/invalid data
• The default behavior is to raise an UnicodeError Exception
49
>>> a = u"Jalape\xf1o">>> b = a.encode("ascii")Traceback (most recent call last): File "<stdin>", line 1, in <module>UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 6: ordinal not in range(128)>>>
Copyright (C) 2008, http://www.dabeaz.com 9-
Encoding Errors
• Encoding/Decoding can use an alternative error handling policy
50
s.decode("encoding",errors)s.encode("encoding",errors)
• Errors is one of'strict' Raise exception (the default)'ignore' Ignore errors'replace' Replace with replacement character'backslashreplace' Use escape code'xmlcharrefreplace' Use XML character reference
253
Copyright (C) 2008, http://www.dabeaz.com 9-
Encoding Errors
• Example: Ignore bad characters
51
>>> a = u"Jalape\xf1o">>> a.encode("ascii",'ignore')'Jalapeo'>>>
• Example: Encode Unicode into ASCII
>>> a = u"Jalape\xf1o">>> b = a.encode("us-ascii","xmlcharrefreplace")'Jalapeño'>>>
Copyright (C) 2008, http://www.dabeaz.com 9-
Finding the Encoding
• How do you determine the encoding of a file?
• Might be known in advance (in the manual)
• May be indicated in the file itself
52
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
• Depends on the data source, application, etc.
254
Copyright (C) 2008, http://www.dabeaz.com 9-
Unicode Everywhere
• Unicode is the modern standard for text
• In Python 3, all text is Unicode
• Here are some basic rules to remember:
• All text files are encoded (even ASCII)
• When you read text, you always decode
• When you write text, you always encode
53
Copyright (C) 2008, http://www.dabeaz.com 9-
A Caution• Unicode may sneak in when you don't expect it
• Database integration
• XML Parsing
• Unicode silently propagates through string-ops
54
s = "Spicy" # Standard 8-bit stringt = u"Jalape\u00f1o" # Unicode string
w = s + t # Unicode : u'SpicyJalape\u00f1o"
• This propagation may break your code if it's not expecting to receive Unicode text
255
Copyright (C) 2008, http://www.dabeaz.com 9-
Exercise 9.4
55
Time : 10 Minutes
256
Binary Data Handling and File I/O
Section 10
Copyright (C) 2008, http://www.dabeaz.com 10-
Introduction
• A major use of Python is to interface with foreign systems and software
• Example : Software written in C/C++
• In this section, we look at the problem of data interchange
• Using Python to decode/encode binary data encodings and other related topics
2
257
Copyright (C) 2008, http://www.dabeaz.com 10-
Overview
• Representation of binary data
• Binary file I/O
• Structure packing/unpacking
• Binary structures and ctypes
• Low-level I/O interfaces
3
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary Data
• Binary data - low-level machine data
• Examples : 16-bit integers, 32-bit integers, 64-bit double precision floats, packed strings, etc.
• Raw low-level data that you typically encounter with typed programming languages such as C, C++, Java, etc.
4
258
Copyright (C) 2008, http://www.dabeaz.com 10-
Common Scenarios
• You might encounter binary encodings in a number of settings
• Special file formats (images, video, audio, etc.)
• Low-level network protocols
• Control of HW (e.g., over serial ports)
• Reading/writing data meant for use by software in other languages (C, C++, etc.)
5
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary Data Representation
• To store binary data, you can use strings
• In Python 2, strings are just byte sequences
6
bytes = '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00'
\xhh - Encodes an arbitrary byte (hh)
• All of the normal string operations work except that you may have to specify a lot of non-text characters using \xhh escape codes
259
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary Data Representation
• In Python 2.6 and newer, a special syntax should be used if you are writing byte literal strings in your program
7
header = b'\x89PNG\r\n'
byte string prefix
• This has been added to disambiguate text (Unicode) strings and raw byte strings
• A caution : This is optional in Python 2, but required in Python 3
Copyright (C) 2008, http://www.dabeaz.com 10-
byte arrays
• Python 2.6 introduces a new bytearray type
8
# Initialized from a byte string
b = bytearray(b"Hello World")
# Preinitialized to a specific size
buf = bytearray(1024)
• A bytearray supports almost all of the usual string operations, but it is mutable
• You can make in-place assignments to characters and slices
260
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary File I/O
9
• To obtain binary data, you'll typically read it from a file, pipe, network socket, etc.
• For files, there are special modes
f = open(filename,"rb") # Read, binary mode
f = open(filename,"wb") # Write, binary mode
f = open(filename,"ab") # Append, binary mode
• Disables all newline translation (reads/writes)
• Required for binary data on Windows
• Optional on Unix (a portability gotcha)
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary Data Packing
• A common method for dealing with binary data is to pack or unpack values from byte strings
10
b'%\x00\x00\x00*\x00\x00\x00'37, 42
python raw byte stringpack
unpack
• Packing/unpacking is about type conversion
• Converting low-level data to/from built-in Python types such as ints, floats, strings, etc.
261
Copyright (C) 2008, http://www.dabeaz.com 10-
struct module
• Packs/unpacks binary records and structures
• Often used when interfacing Python to foreign systems or when reading non-text files (images, audio, video, etc.)
import struct
# Unpack two raw 32-bit integers from a string
x,y = struct.unpack("ii",s)
# Pack a set of fields
r = struct.pack("8sif", "GOOG",100, 490.10)
11
Copyright (C) 2008, http://www.dabeaz.com 10-
struct module• Packing/unpacking codes (based on C)
'c' char (1 byte string)
'b' signed char (8-bit integer)
'B' unsigned char (8-bit integer)
'h' short (16-bit integer)
'H' unsigned short (16-bit integer)
'i' int (32-bit integer)
'I' unsigned int (32-bit integer)
'l' long (32 or 64 bit integer)
'L' unsigned long (32 or 64 bit integer)
'q' long long (64 bit integer)
'Q' unsigned long long (64 bit integer)
'f' float (32 bit)
'd' double (64 bit)
's' char[] (String)
'p' char[] (String with 8-bit length)
'P' void * (Pointer)
12
262
Copyright (C) 2008, http://www.dabeaz.com 10-
struct module• Each code may be preceded by repetition
count'4i' 4 integers
'20s' 20-byte string
• Integer alignment modifiers'@' Native byte order and alignment
'=' Native byte order, standard alignment
'<' Little-endian, standard alignment
'>' Big-endian, standard alignment
'!' Network (big-endian), standard align
• Standard alignment
Data is aligned to start on a multiple of the size. For example, an integer (4 bytes) is aligned to start on a 4-byte boundary.
13
Copyright (C) 2008, http://www.dabeaz.com 10-
struct Example• An IP packet header has this structure
• To unpack in Python, might do this:
ver hlen tos length
ident fragment
ttl proto checksum
sourceaddr
destaddr
32-bits
(vhlen,tos,length,
ident,fragment,
ttl,proto,checksum,
sourceaddr,
destaddr) = struct.unpack("!BBHHHBBHII", pkt)
ver = (vhlen & 0xf0) >> 4
hlen = vhlen & 0x0f
14
263
Copyright (C) 2008, http://www.dabeaz.com 10-
Exercise 10.1
15
Time : 15 Minutes
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary Type Objects
• Instead of unpacking/packing, an alternative approach is to define new kinds of Python objects that internally represent low-level binary data in its native format
• These objects then provide an interface for manipulating the internal data from Python
• Instead of converting data, this is just "wrapping" binary data with a Python layer
16
264
Copyright (C) 2008, http://www.dabeaz.com 10-
ctypes library
• A library that has objects for defining low-level C types, structures and arrays
• Can create objects that look a lot like normal Python objects except that they are represented using a C memory layout
• A companion/alternative to struct
17
Copyright (C) 2008, http://www.dabeaz.com 10-
ctypes types• There are a predefined set of type objects
18
ctypes type C Datatype
------------------ ---------------------------
c_byte signed char
c_char char
c_char_p char *
c_double double
c_float float
c_int int
c_long long
c_longlong long long
c_short short
c_uint unsigned int
c_ulong unsigned long
c_ushort unsigned short
c_void_p void *
265
Copyright (C) 2008, http://www.dabeaz.com 10-
ctypes types• Types are objects that can be instantiated
19
>>> from ctypes import *
>>> x = c_long(42)
>>> x
c_long(42)
>>> print x.value
42
>>> x.value = 23
>>> x
c_long(23)
>>>
• Unlike many Python types, the values are mutable (access/modify through .value)
• Accessing the value is touching raw memory
Copyright (C) 2008, http://www.dabeaz.com 10-
ctypes arrays
• Defining a C array type
20
>>> long4 = c_long * 4
>>> long4
<class '__main__.c_long_Array_4'>
>>>
• Creating and using an instance of this type>>> a = long4(1,2,3,4)
>>> len(a)
4
>>> a[0]
1
>>> a[1]
2
>>> for x in a: print a,
1 2 3 4
>>>
266
Copyright (C) 2008, http://www.dabeaz.com 10-
ctypes structures
• Defining a C structure
21
>>> class Point(Structure):
_fields_ = [ ('x', c_double),
('y', c_double) ]
>>> p = Point(2,3)
>>> p.x
2.0
>>> p.y
3.0
>>>
• Looks kind of like a normal Python object, but is really just a layer over a C struct
Copyright (C) 2008, http://www.dabeaz.com 10-
Direct I/O
• Few Python programmers know this, but ctypes objects support direct I/O
• Can directly write ctypes objects onto files
22
p = Point(2,3)
f = open("somefile","wb")
f.write(p)
...
• Can directly read into existing ctypes objectsf = open("somefile","rb")
p = Point() # Create an empty point
f.readinto(p) # Read into it
...
267
Copyright (C) 2008, http://www.dabeaz.com 10-
Exercise 10.2
23
Time : 10 Minutes
Copyright (C) 2008, http://www.dabeaz.com 10-
Binary Arrays
• array module used to define binary arrays
24
>>> from array import array
>>> a = array('i',[1,2,3,4,5]) # Integer array
>>> a.append(6)
>>> a
array('i',[1,2,3,4,5,6])
>>> a.append(37.5)
TypeError: an integer is required
>>>
• An array is a like a list except that all elements are constrained to a single type
a.typecode # Array type code
a.itemsize # Item size in bytes
268
Copyright (C) 2008, http://www.dabeaz.com 10-
Array Initialization• array(typecode [, initializer])
25
Typecode Meaning
-------- -----------------
'c' 8-bit character
'b' 8-bit integer (signed)
'B' 8-bit integer (unsigned)
'h' 16-bit integer (signed)
'H' 16-bit integer (unsigned)
'i' Integer (int)
'I' Unsigned integer (unsigned int)
'l' Long integer (long)
'L' Unsigned long integer (unsigned long)
'f' 32-bit float (single precision)
'd' 64-bit float (double precision)
• Initializer is just a sequence of input values
Copyright (C) 2008, http://www.dabeaz.com 10-
Array Operations
• Arrays work a lot like lists
26
a.append(item) # Append an item
a.index(item) # Index of item
a.remove(item) # Remove item
a.count(item) # Count occurrences
a.insert(index,item) # Insert item
• Key difference is the uniform data type
• Underneath the covers, data is stored in a contiguous memory region--just like an array in C/C++
269
Copyright (C) 2008, http://www.dabeaz.com 10-
Array Operations
• Arrays also have I/O functions
27
a.fromfile(f,n) # Read and append n items
# from file f
a.tofile(f) # Write all items to f
• Arrays have string packing functions
a.fromstring(s) # Appends items from packed
# binary string s
a.tostring() # Convert to a packed string
Copyright (C) 2008, http://www.dabeaz.com 10-
Some Facts about Arrays
• Arrays are more space efficient
28
1000000 integers in a list : 16 Mbytes
1000000 integers in an array : 4 Mbytes
• List item access is about 70% faster. Arrays store values as native C datatypes so every access involves creating a Python object (int, float) to view the value
• Arrays are not mathematical vectors or matrices. The (+) operator concatenates.
270
Copyright (C) 2008, http://www.dabeaz.com 10-
numpy arrays
• For mathematics, consider the numpy library
• It provides arrays that behave like vectors and matrices (element-wise operations)
29
>>> from numpy import array
>>> a = array([1,2,3,4])
>>> b = array([5,6,7,8])
>>> a + b
array([6, 8, 10, 12])
>>>
• Many scientific packages build upon this
271
Working with ProcessesSection 11
Copyright (C) 2008, http://www.dabeaz.com 11-
Overview
• Using Python to launch other processes, control their environment, and collect their output
• Detailed coverage of subprocess module
2
272
Copyright (C) 2008, http://www.dabeaz.com 11-
Python Interpreter
• Python comes from the Unix/C tradition
• Interpreter is command-line/console based
3
% python foo.py
• Input/output is from a terminal (tty)
• Programs may optionally use command line arguments and environment variables
• Python can control other programs that use the same Unix conventions
Copyright (C) 2008, http://www.dabeaz.com 11-
Subprocesses
• A program can create a new process
• This is called a "subprocess"
• The subprocess often runs under the control of the original process (which is known as the "parent" process)
• Parent often wants to collect output or the status result of the subprocess
4
273
Copyright (C) 2008, http://www.dabeaz.com 11-
Simple Subprocesses
• There are a number of functions for simple subprocess control
• Executing a system shell command
5
os.system("rm -rf /tmpdata")
• Capturing the output of a command (Unix)
import commands
s = commands.getoutput("ls -l")
• This would be the closest Python equivalent to backticks (`ls -l`) in the shell, Perl, etc.
Copyright (C) 2008, http://www.dabeaz.com 11-
Advanced Subprocesses
• Python has several modules for subprocesses
• os, popen2, subprocess
6
• Historically, this has been a bit of a moving target. In older Python code, you will probably see extensive use of the popen2 and os modules
• Currently, the official "party line" is to use the subprocess module
274
Copyright (C) 2008, http://www.dabeaz.com 11-
subprocess Module
• A high-level module for launching subprocesses
• Cross-platform (Unix/Windows)
• Tries to consolidate the functionality of a wide-assortment of low-level system calls (system, popen(), exec(), spawn(), etc.)
• Will illustrate with some common use cases
7
Copyright (C) 2008, http://www.dabeaz.com 11-
Executing Commands
Problem: You want to execute a simple command or run a separate program. You don't care about capturing its output.
import subprocess
p = subprocess.Popen(['mkdir','temp'])
q = subprocess.Popen(['rm','-f','tempdata'])
• Executes a command string
• Returns a Popen object (more in a minute)
8
275
Copyright (C) 2008, http://www.dabeaz.com 11-
Specifying the Command
subprocess.Popen(['rm','-f','tempdata'])
• Popen() accepts a list of command args
9
• These are the same as the args in the shellshell % rm -f tempdata
• Note: Each "argument" is a separate itemsubprocess.Popen(['rm','-f','tempdata']) # Good
subprocess.Popen(['rm','-f tempdata']) # Bad
Don't merge multiple arguments into a single string like this.
Copyright (C) 2008, http://www.dabeaz.com 11-
Environment Vars
env_vars = { 'NAME1' : 'VALUE1',
'NAME2' : 'VALUE2',
...
}
p = subprocess.Popen(['cmd','arg1',...,'argn'],
env=env_vars)
• How to set up environment variables in child
10
• Note : If this is supplied and there is a PATH environment variable, it will be used to search for the command (Unix)
276
Copyright (C) 2008, http://www.dabeaz.com 11-
Current Directory
p = subprocess.Popen(['cmd','arg1',...,'argn'],
cwd='/some/directory')
• If you need to change the working directory
11
• Note: This changes the working directory for the subprocess, but does not affect how Popen() searches for the command
Copyright (C) 2008, http://www.dabeaz.com 11-
Collecting Status Codes
• When subprocess terminates, it returns a status
• An integer code of some kind
12
C:
exit(status);
Java:
System.exit(status);
Python:
raise SystemExit(status)
• Convention is for 0 to indicate "success." Anything else is an error.
277
Copyright (C) 2008, http://www.dabeaz.com 11-
Collecting Status Codes
p = subprocess.Popen(['cmd','arg1',...,'argn'])
...
status = p.wait()
• When you launch a subprocess, it runs independently from the parent
• To wait and collect status, use wait()
13
• Status will be the integer return code (which is also stored)
p.returncode # Exit status of subprocess
Copyright (C) 2008, http://www.dabeaz.com 11-
Polling a Subprocess
p = subprocess.Popen(['cmd','arg1',...,'argn'])
...
if p.poll() is None:
# Process is still running
else:
status = p.returncode # Get the return code
• poll() - Checks status of subprocess
14
• Returns None if the process is still running, otherwise the returncode is returned
278
Copyright (C) 2008, http://www.dabeaz.com 11-
Killing a Subprocess
p = subprocess.Popen(['cmd','arg1',...,'argn'])
...
p.terminate()
• In Python 2.6 or newer, use terminate()
15
• In older versions, you have to hack it yourself
# Unix
import os
os.kill(p.pid)
# Windows
import win32api
win32api.TerminateProcess(int(p._handle),-1)
Copyright (C) 2008, http://www.dabeaz.com 11-
Exercise 11.1
16
Time : 20 Minutes
279
Copyright (C) 2008, http://www.dabeaz.com 11-
Capturing OutputProblem: You want to execute another program and capture its output
• Use additional options to Popen()import subprocess
p = subprocess.Popen(['cmd'],
stdout=subprocess.PIPE)
data = p.stdout.read()
• This works with both Unix and Windows
• Captures any output printed to stdout
17
Copyright (C) 2008, http://www.dabeaz.com 11-
Sending/Receiving DataProblem: You want to execute a program, send it some input data, and capture its output
• Set up pipes using Popen()
p = subprocess.Popen(['cmd'],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE)
p.stdin.write(data) # Send data
p.stdin.close() # No more input
result = p.stdout.read() # Read output
python cmdp.stdout
p.stdin stdin
stdout
18
280
Copyright (C) 2008, http://www.dabeaz.com 11-
Sending/Receiving DataProblem: You want to execute a program, send it some input data, and capture its output
• Set up pipes using Popen()
p = subprocess.Popen(['cmd'],
stdin = subprocess.PIPE,
stdout = subprocess.PIPE)
p.stdin.write(data) # Send datap.stdin.close() # No more input
result = p.stdout.read() # Read output
python cmdp.stdout
p.stdin stdin
stdout
19
Pair of files that areare hooked up to the
subprocess
Copyright (C) 2008, http://www.dabeaz.com 11-
Sending/Receiving Data• How to capture stderr
p = subprocess.Popen(['cmd'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
python cmdp.stdout
p.stdin stdin
stdout
20
p.stderr stderr
• Note: stdout/stderr can also be merged
p = subprocess.Popen(['cmd'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
281
Copyright (C) 2008, http://www.dabeaz.com 11-
I/O Redirection
• Connecting input to a file
f_in = open("somefile","r")
p = subprocess.Popen(['cmd'], stdin=f_in)
21
• Connecting the output to a file
f_out = open("somefile","w")
p = subprocess.Popen(['cmd'], stdout=f_out)
• Basically, stdin and stdout can be connected to any open file object
• Note : Must be a real file in the OS
Copyright (C) 2008, http://www.dabeaz.com 11-
Subprocess I/O
• Subprocess module can be used to set up fairly complex I/O patterns
22
import subprocess
p1 = subprocess.Popen("ls -l", shell=True,
stdout=subprocess.PIPE)
p2 = subprocess.Popen("wc",shell=True,
stdin=p1.stdout,
stdout=subprocess.PIPE)
out = p2.stdout.read()
• Note: this is the same as this in the shell
shell % ls -l | wc
282
Copyright (C) 2008, http://www.dabeaz.com 11-
I/O Issues
• subprocess module does not work well for controlling interactive processes
• Buffering behavior is often wrong (may hang)
• Pipes don't properly emulate terminals
• Subprocess may not operate correctly
• Does not work for sending keyboard input typed into any kind of GUI.
23
Copyright (C) 2008, http://www.dabeaz.com 11-
Interactive SubprocessesProblem: You want to launch a subprocess, but it involves interactive console-based user input
• Must launch subprocess and make its input emulate an interactive terminal (TTY)
• Best bet: Use a third-party "Expect" library (e.g., pexpect)
• Modeled after Unix "expect" tool (Don Libes)
24
283
Copyright (C) 2008, http://www.dabeaz.com 11-
pexpect Example• http://pexpect.sourceforge.net
• Sample of controlling an interactive session
25
import pexpect
child = pexpect.spawn("ftp ftp.gnu.org")
child.expect('Name .*:')
child.sendline('anonymous')
child.expect('ftp> ')
child.sendline('cd gnu/emacs')
child.expect('ftp> ')
child.sendline('ls')
child.expect('ftp> ')
print child.before
child.sendline('quit')
Expected output
Send responses
Copyright (C) 2008, http://www.dabeaz.com 11-
Odds and Ends• Python a large number of other modules and
functions related to process management
• os module (fork, wait, exec, etc.)
• signal module (signal handling)
• time module (system time functions)
• resource (system resource limits)
• locale (internationalization)
• _winreg (Windows Registry)
26
284
Copyright (C) 2008, http://www.dabeaz.com 11-
Exercise 11.2
27
Time : 10 Minutes
285
Copyright (C) 2008, http://www.dabeaz.com 12-
Python Integration Primer
Section 12
1
Copyright (C) 2008, http://www.dabeaz.com 12-
Python Integration
• People don't use Python in isolation
• They use it to interact with other software
• Software not necessarily written in Python
• This is one of Python's greatest strengths!
2
286
Copyright (C) 2008, http://www.dabeaz.com 12-
Overview
• A brief tour of how Python integrates with the outside world
• Support for common data formats
• Network programming
• Accessing C libraries
• COM Extensions
• Jython (Java) and IronPython (.NET)
3
Copyright (C) 2008, http://www.dabeaz.com 12-
Data Interchange
• Python is adept at exchanging data with other programs using standard data formats
• Example : Processing XML
4
287
Copyright (C) 2008, http://www.dabeaz.com 12-
XML Overview
• XML documents use structured markup<contact>
<name>Elwood Blues</name>
<address>1060 W Addison</address>
<city>Chicago</city>
<zip>60616</zip>
</contact>
• Documents made up of elements<name>Elwood Blues</name>
• Elements have starting/ending tags
• May contain text and other elements
5
Copyright (C) 2008, http://www.dabeaz.com 12-
XML Example<?xml version="1.0" encoding="iso-8859-1"?>
<recipe>
<title>Famous Guacamole</title>
<description>
A southwest favorite!
</description>
<ingredients>
<item num="2">Large avocados, chopped</item>
<item num="1">Tomato, chopped</item>
<item num="1/2" units="C">White onion, chopped</item>
<item num="1" units="tbl">Fresh squeezed lemon juice</item>
<item num="1">Jalapeno pepper, diced</item>
<item num="1" units="tbl">Fresh cilantro, minced</item>
<item num="3" units="tsp">Sea Salt</item>
<item num="6" units="bottles">Ice-cold beer</item>
</ingredients>
<directions>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</directions>
</recipe>
6
288
Copyright (C) 2008, http://www.dabeaz.com 12-
XML Parsing
• Parsing XML documents is easy
• Use the xml.etree.ElementTree module
7
Copyright (C) 2008, http://www.dabeaz.com 12-
etree Parsing
• How to parse an XML document
>>> import xml.etree.ElementTree
>>> doc = xml.etree.ElementTree.parse("recipe.xml")
• This creates an ElementTree object>>> doc
<xml.etree.ElementTree.ElementTree instance at 0x6e1e8>
• Supports many high-level operations
8
289
Copyright (C) 2008, http://www.dabeaz.com 12-
etree Parsing
• Obtaining selected elements
>>> title = doc.find("title")
>>> firstitem = doc.find("ingredients/item")
>>> firstitem2 = doc.find("*/item")
>>> anyitem = doc.find(".//item")
• find() always returns the first element found
• Note: Element selection syntax above
9
Copyright (C) 2008, http://www.dabeaz.com 12-
etree Parsing
• To loop over all matching elements
>>> for i in doc.findall("ingredients/item"):
... print i
...
<Element item at 7a6c10>
<Element item at 7a6be8>
<Element item at 7a6d50>
<Element item at 7a6d28>
<Element item at 7a6b20>
<Element item at 7a6d00>
<Element item at 7a6af8>
<Element item at 7a6d78>
10
290
Copyright (C) 2008, http://www.dabeaz.com 12-
etree Parsing• Obtaining the text from an element
>>> doc.findtext("title")
'Famous Guacamole'
>>> doc.findtext("directions")
'\n Combine all ingredients and hand whisk to
desired consistency.\n Serve and enjoy with ice-cold
beers.\n '
>>> doc.findtext(".//item")
'Large avocados, chopped'
>>>
11
• Searching and text extraction combined
>>> e = doc.find("title")
>>> e.text
'Famous Guacamole'
>>>
Copyright (C) 2008, http://www.dabeaz.com 12-
etree Parsing
• To obtain the element tag
>>> elem.tag
'item'
>>>
12
• To obtain element attributes>>> item.get('num')
'6'
>>> item.get('units')
'bottles'
291
Copyright (C) 2008, http://www.dabeaz.com 12-
etree Example
• Print out recipe ingredientsfor i in doc.findall("ingredients/item"):
unitstr = "%s %s" % (i.get("num"),i.get("units",""))
print "%-10s %s" % (unitstr,i.text)
• Output2 Large avocados, chopped
1 Tomato, chopped
1/2 C White onion, chopped
1 tbl Fresh squeezed lemon juice
1 Jalapeno pepper, diced
1 tbl Fresh cilantro, minced
3 tsp Sea Salt
6 bottles Ice-cold beer
13
Copyright (C) 2008, http://www.dabeaz.com 12-
Exercise 12.1
14
Time : 10 Minutes
292
Copyright (C) 2008, http://www.dabeaz.com 12-
Network Programming
• Python has very strong support for network programming applications
• Low-level socket programming
• Server side/client side modules
• High-level application protocols
• Will look at just a few simple examples
15
Copyright (C) 2008, http://www.dabeaz.com 12-
Low-level Sockets• A simple TCP server (Hello World)
16
from socket import *
s = socket(AF_INET,SOCK_STREAM)
s.bind(("",9000))
s.listen(5)
while True:
c,a = s.accept()
print "Received connection from", a
c.send("Hello %s\n" % a[0])
c.close()
• socket module provides low-level networking
• Programming API is almost identical to socket programming in C, Java, etc.
293
Copyright (C) 2008, http://www.dabeaz.com 12-
High-Level Sockets
• A simple TCP server (Hello World)
17
import SocketServer
class HelloHandler(SocketServer.BaseRequestHandler):
def handle(self):
print "Connection from", self.client_address
self.request.sendall("Hello World\n")
serv = SocketServer.TCPServer(("",8000),HelloHandler)
serv.serve_forever()
• SocketServer module hides details
• Makes it easy to implement network servers
Copyright (C) 2008, http://www.dabeaz.com 12-
A Simple Web Server
• Serve files from a directory
18
from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler
import os
os.chdir("/home/docs/html")
serv = HTTPServer(("",8080),SimpleHTTPRequestHandler)
serv.serve_forever()
• This creates a minimal web server
• Connect with a browser and try it out
294
Copyright (C) 2008, http://www.dabeaz.com 12-
XML-RPC
• Remote Procedure Call
• Uses HTTP as a transport protocol
• Parameters/Results encoded in XML
• An RPC standard that is supported across a variety of different programming languages (Java, Javascript, C++, etc.)
19
Copyright (C) 2008, http://www.dabeaz.com 12-
Simple XML-RPC• How to create a stand-alone server
20
from SimpleXMLRPCServer import SimpleXMLRPCServer
def add(x,y):
return x+y
s = SimpleXMLRPCServer(("",8080))
s.register_function(add)
s.serve_forever()
• How to test it (xmlrpclib)>>> import xmlrpclib
>>> s = xmlrpclib.ServerProxy("http://localhost:8080")
>>> s.add(3,5)
8
>>> s.add("Hello","World")
"HelloWorld"
>>>
295
Copyright (C) 2008, http://www.dabeaz.com 12-
Simple XML-RPC• Adding multiple functions
21
from SimpleXMLRPCServer import SimpleXMLRPCServer
s = SimpleXMLRPCServer(("",8080))
s.register_function(add)
s.register_function(foo)
s.register_function(bar)
s.serve_forever()
• Registering an instance (exposes all methods)from SimpleXMLRPCServer import SimpleXMLRPCServer
s = SimpleXMLRPCServer(("",8080))
obj = SomeObject()
s.register_instance(obj)
s.serve_forever()
Copyright (C) 2008, http://www.dabeaz.com 12-
Exercise 12.2
22
Time : 15 Minutes
296
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Module
• A library module that allows C functions to be executed in arbitrary shared libraries/DLLs
• One of several approaches that can be used to access foreign C functions
23
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Example• Consider this C code:
24
int fact(int n) {
if (n <= 0) return 1;
return n*fact(n-1);
}
int cmp(char *s, char *t) {
return strcmp(s,t);
}
double half(double x) {
return 0.5*x;
}
• Suppose it was compiled into a shared lib
% cc -shared example.c -o libexample.so
297
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Example
• Using C types
25
>>> import ctypes
>>> ex = ctypes.cdll.LoadLibrary("./libexample.so")
>>> ex.fact(4)
24
>>> ex.cmp("Hello","World")
-1
>>> ex.cmp("Foo","Foo")
0
>>>
• It just works (heavy wizardry)
• However, there is a catch...
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Caution
• C libraries don't contain type information
• So, ctypes has to guess...
26
>>> import ctypes
>>> ex = ctypes.cdll.LoadLibrary("./libexample.so")
>>> ex.fact("Howdy")
1
>>> ex.cmp(4,5)
Segmentation Fault
• And unfortunately, it usually gets it wrong
• However, you can help it out.
298
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Types
• You just have to provide type signatures
27
>>> ex.half.argtypes = (ctypes.c_double,)
>>> ex.half.restype = ctypes.c_double
>>> ex.half(5.0)
2.5
>>>
• Creates a minimal prototype
.argtypes # Tuple of argument types
.restype # Return type of a function
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Types• Sampling of datatypes available
28
ctypes type C Datatype
------------------ ---------------------------
c_byte signed char
c_char char
c_char_p char *
c_double double
c_float float
c_int int
c_long long
c_longlong long long
c_short short
c_uint unsigned int
c_ulong unsigned long
c_ushort unsigned short
c_void_p void *
299
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes Cautions
• Requires detailed knowledge of underlying C library and how it operates
• Function names
• Argument types and return types
• Data structures
• Side effects/Semantics
• Memory management
29
Copyright (C) 2008, http://www.dabeaz.com 12-
ctypes and C++
• Not really supported
• This is more the fault of C++
• C++ creates libraries that aren't easy to work with (non-portable name mangling, vtables, etc.)
• C++ programs may use features not easily mapped to ctypes (e.g., templates, operator overloading, smart pointers, RTTI, etc.)
30
300
Copyright (C) 2008, http://www.dabeaz.com 12-
Extension Commentary
• There are more advanced ways of extended Python with C and C++ code
• Low-level extension API
• Code generator tools (e.g., Swig, Boost, etc.)
• More details in an advanced course
31
Copyright (C) 2008, http://www.dabeaz.com 12-
Exercise 12.3
32
Time : 20 minutes
301
Copyright (C) 2008, http://www.dabeaz.com 12-
COM Extensions
• On Windows, Python can interact with COM
• Allows Python to script applications on Windows (e.g., Microsoft Office)
33
Copyright (C) 2008, http://www.dabeaz.com 12-
Pythonwin and COM
• To work with COM, use Pythonwin extensions
• An extension to Python that includes additional modules for Windows
• A separate download
34
http://sourceforge.net/projects/pywin32
• The book: "Python Programming on Win32", by Mark Hammond and Andy Robinson (O'Reilly)
302
Copyright (C) 2008, http://www.dabeaz.com 12-
Disclaimer
• Covering all of COM is impossible here
• There are a lot of tricky details
• Will provide a general idea of how it works in Python
• Consult a reference for gory details
35
Copyright (C) 2008, http://www.dabeaz.com 12-
Python COM Client
• Example : Control Microsoft Word
• First step: Get a COM object for Word
>>> import win32com.client
>>> w = win32com.client.Dispatch("Word.Application")
>>> w
<COMObject Word.Application>
>>>
36
303
Copyright (C) 2008, http://www.dabeaz.com 12-
Python as a client>>> import win32com.client
>>> w = win32com.client.Dispatch("Word.Application")
>>> w
<COMObject Word.Application>
>>>
37
This refers to anobject in the registry
Copyright (C) 2008, http://www.dabeaz.com 12-
Using a COM object
• Once you have a COM object, you access its attributes to do things
• Example:
>>> doc = w.Documents.Add()
>>> doc.Range(0,0).Select()
>>> sel = w.Selection
>>> sel.InsertAfter("Hello World\n")
>>> sel.Style = "Heading 1"
>>> sel.Collapse(0)
>>> sel.InsertAfter("This is a test\n")
>>> sel.Style = "Normal"
>>> doc.SaveAs("HelloWorld")
>>> w.Visible = True
>>>
38
304
Copyright (C) 2008, http://www.dabeaz.com 12-
Sample Output
39
Copyright (C) 2008, http://www.dabeaz.com 12-
Commentary
• Obviously, there are many more details
• Must know method names/arguments of objects you're using
• May require Microsoft documentation
• Python IDEs (IDLE) may not help much
40
305
Copyright (C) 2008, http://www.dabeaz.com 12-
Jython
• A pure Java implementation of Python
• Can be used to write Python scripts that interact with Java classes and libraries
• Official Location:
41
http://www.jython.org
Copyright (C) 2008, http://www.dabeaz.com 12-
Jython Example
• Jython runs like normal Python
42
% jython
Jython 2.2.1 on java1.5.0_13
Type "copyright", "credits" or "license" for more
information.
>>>
• And it works like normal Python
>>> print "Hello World"
Hello World
>>> for i in xrange(5):
... print i,
...
0 1 2 3 4
>>>
306
Copyright (C) 2008, http://www.dabeaz.com 12-
Jython Example
• Many standard library modules work
43
>>> import urllib
>>> u = urllib.urlopen("http://www.python.org")
>>> page = u.read()
>>>
• And you get access to Java libraries
>>> import java.util
>>> d = java.util.Date()
>>> d
Sat Jul 26 13:31:49 CDT 2008
>>> dir(d)
['UTC', '__init__', 'after', 'before', 'class',
'clone', 'compareTo', 'date', 'day', 'equals',
'getClass', 'getDate', ... ]
>>>
Copyright (C) 2008, http://www.dabeaz.com 12-
Jython Example• A Java Class
44
public class Stock {
public String name;
public int shares;
public double price;
public Stock(String n, int s, double p) {
! name = n;
! shares = s;
! price = p;
}
public double cost() { return shares*price; }
};
• In Jython>>> import Stock
>>> s = Stock('GOOG',100,490.10)
>>> s.cost()
49010.0
>>>
307
Copyright (C) 2008, http://www.dabeaz.com 12-
IronPython
• Python implemented in C#/.NET
• Can be used to write Python scripts that control components in .NET
• Official Location:
45
http://www.codeplex.com/IronPython
Copyright (C) 2008, http://www.dabeaz.com 12-
IronPython Example
• IronPython runs like normal Python
46
% ipy
IronPython 1.1.2 (1.1.2) on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights
reserved.
>>>
• And it also works like normal Python>>> print "Hello World"
Hello World
>>> for i in xrange(5):
... print i,
...
0 1 2 3 4
>>>
308
Copyright (C) 2008, http://www.dabeaz.com 12-
IronPython Example
• You get access to .NET libraries
47
>>> import System.Math
>>> dir(System.Math)
['Abs', 'Acos', 'Asin', 'Atan', 'Atan2', 'BigMul',
'Ceiling', 'Cos', 'Cosh', ...]
>>> System.Math.Cos(3)
-0.9899924966
>>>
• Can script classes written in C#
• Same idea as with Jython
Copyright (C) 2008, http://www.dabeaz.com 12-
Commentary
• Jython and IronPython have limitations
• Features lag behind those found in the main Python distribution (for example, Jython doesn't have generators)
• Small set of standard library modules work (many of the standard libraries depend on C code which isn't supported)
• No reason to use unless you are working with Java or .NET
48
309
Copyright (C) 2008, http://www.dabeaz.com 12-
Some Final Words
• Integration is the real reason why Python is used by smart programmers and why they consider it to be a "secret weapon"
• With Python, you don't give up your code in C, C++, Java, etc.
• Instead, Python makes that code better
• Python makes difficult problems easier and impossible problems more feasible.
49
Copyright (C) 2008, http://www.dabeaz.com 12-
Intro Course Wrap-up
• First 12 sections of this class are just the start
• Basics of the Python language
• How to work with data
• How to organize Python software
• How Python interacts with the environment
• How it interacts with other programs
50
310
Copyright (C) 2008, http://www.dabeaz.com 12-
Further Topics
• Network/Internet Programming (Day 4)
• Graphical User Interfaces
• Frameworks and Components
• Functional Programming
• Metaprogramming
• Extension programming
51
311
Python Slide IndexSymbols and Numbers
!= operator, 1-40#, program comment, 1-32%, floating point modulo operator, 1-54%, integer modulo operator, 1-50%, string formatting operator, 2-25&, bit-wise and operator, 1-50&, set intersection operator, 2-22() operator, 5-29(), tuple, 2-6**, power operator, 1-54*, list replication, 1-69*, multiply operator, 1-50, 1-54*, sequence replication operator, 2-30*, string replication operator, 1-60+, add operator, 1-50, 1-54+, list concatenation, 1-67+, String concatenation, 1-59, 9-24-, set difference operator, 2-22-, subtract operator, 1-50, 1-54-i option, Python interpreter, 7-26-O option, Python interpreter, 7-22, 7-23. operator, 5-29... interpreter prompt, 1-15.NET framework, and IronPython, 12-45.pyc files, 4-11.pyo files, 4-11/ division operator, 1-54/, division operator, 1-50//, floor division operator, 1-50, 1-51< operator, 1-40<<, left shift operator, 1-50<= operator, 1-40== operator, 1-40> operator, 1-40>= operator, 1-40>>, right shift operator, 1-50>>> interpreter prompt, 1-15[:] string slicing,, 1-59[:], slicing operator, 2-31[], sequence indexing, 2-29[], string indexing, 1-59\ Line continuation character, 1-43^, bit-wise xor operator, 1-50
{}, dictionary, 2-13|, bit-wise or operator, 1-50|, set union operator, 2-22~, bit-wise negation operator, 1-50
A
ABC language, 1-5abs() function, 1-50, 1-54, 4-24Absolute value function, 1-50, 1-54__abs__() method, 5-26Accessor methods, 6-30Adding items to dictionary, 2-14__add__() method, 5-26Advanced String Formatting, 9-31and operator, 1-40__and__() method, 5-26Anonymous functions, 3-45append() method, of lists, 1-67Argument naming style, 3-20Arguments, passing mutable objects, 3-25Arguments, transforming in function, 3-22argv variable, sys module, 4-28array module, 10-24array module typecodes, 10-25array object I/O functions, 10-27array object operations, 10-26arrays vs. lists, 10-28arrays, creating with ctypes, 10-20assert statement, 7-20assert statement, stripping in -O mode, 7-22assertEqual() method, unittest module, 7-15AssertionError exception, 7-20Assertions, and unit testing, 7-15assertNotEqual() method, unittest module, 7-15assertRaises() method, unittest module, 7-15assert_() method, unittest module, 7-15Assignment, copying, 2-65Assignment, reference counting, 2-62, 2-64, 2-65Associative array, 2-13Attribute access functions, 5-31Attribute binding, 6-12Attribute lookup, 6-17, 6-20Attribute, definition of, 5-3Attributes, computed using properties, 6-33, 6-36, 6-37Attributes, modifying values, 6-13Attributes, private, 6-27Awk, and list compreheneions, 2-57
B
Base class, 5-11base-10 decimals, 4-58__bases__ attribute of classes, 6-19binary arrays, 10-24binary data, 10-4binary data representation, 10-6Binary files, 10-9binary type objects, 10-16Binding of attributes in objects, 6-12Block comments, 1-32Boolean type, 1-46, 1-47Booleans, and integers, 1-47Boost Python, 12-31Bottom up programming style, 3-11break statement, 2-36Breakpoint, debugger, 7-29Breakpoint, setting in debugger, 7-31Built-in exceptions, 3-52__builtins__ module, 4-23bytearray type, 10-8bytes literals, 10-7
C
C extension, example with ctypes, 12-25C extensions, accessing with ctypes module, 12-23C Extensions, other tools, 12-31C3 Linearization Algorithm, 6-23Callback functions, 3-44Calling a function, 1-76Calling other methods in the same class, 5-9Capturing output of a subprocess, 11-17Case conversion, 1-61Catching exceptions, 1-79Catching multiple exceptions, 3-54chr() function, 4-25Class implementation chart, 6-11class statement, 5-4class statement, defining methods, 5-8Class,, 6-9Class, representation of, 6-9Class, __slots__ attribute of, 6-38__class__ attribute of instances, 6-10close() function, shelve module, 4-46Code blocks and indentation, 1-37Code formatting, 1-43
Code reuse, and generators, 8-30collect() function, gc module, 6-45collections module, 4-60Colon, and indentation, 1-37COM, 12-33, 12-34COM, example of controlling Microsoft Word, 12-38COM, launching Microsoft Word, 12-36Command line arguments, manual parsing, 4-29Command line options, 4-28Command line options, parsing with optparse, 4-30Command line, running Python, 1-24Comments, 1-32, 7-3Comments, vs. documentation strings, 7-3Community links, 1-2Compiling regular expressions, 9-13Complex type, 1-46complex() function, 4-25Computed attributes, 6-33, 6-36, 6-37Concatenation of strings, 1-59Concatenation, lists, 1-67Concatenation, of sequences, 2-30Conditionals, 1-39ConfigParser module, 4-48Conformance checking of functions, 3-41Container, 2-17Containers, dictionary, 2-19Containers, list, 2-18__contains__() method, 5-25continue statement, 2-36Contract programming, 7-21Conversion of numbers, 1-55Converting to strings, 1-64copy module, 2-68, 4-35copy() function, copy module, 4-35copy() function, shutil module, 4-42Copying and moving files, 4-42copytree() function, shutil module, 4-42cos() function, math module, 1-54cPickle module, 4-44Creating new objects, 5-4Creating programs, 1-19ctime() function, time module, 4-41ctypes direct I/O support, 10-22ctypes library, 10-17ctypes module, and C++, 12-30ctypes module, example of, 12-25ctypes module, limitations of, 12-29ctypes module, specifying type signatures, 12-27ctypes module, supported datatypes, 12-28
ctypes modules, 12-23ctypes, creating a C datatype instance, 10-19ctypes, creating arrays, 10-20ctypes, list of datatypes, 10-18ctypes, structures, 10-21
D
Data interchange, 12-4Data structure, dictionary, 6-3Data structures, 2-5Database like queries on lists, 2-55Database, interface to, 4-61Database, SQL injection attack risk, 4-63Datatypes, in library modules, 4-57Date and time manipulation, 4-59datetime module, 4-59Debugger, 7-27Debugger, breakpoint, 7-29, 7-31Debugger, commands, 7-29Debugger, launching inside a program, 7-28Debugger, listing source code, 7-31Debugger, running at command line, 7-33Debugger, running functions, 7-32Debugger, single step execution, 7-32Debugger, stack trace, 7-30__debug__ variable, 7-23decimal module, 4-58Declarative programming, 2-58Deep copies, 2-68deepcopy() function, copy module, 2-68, 4-35def statement, 1-76, 3-8Defining a function, 3-8Defining new functions, 1-76Defining new objects, 5-4Definition order, 3-7del operator, lists, 1-70delattr() function, 5-31Deleting items from dictionary, 2-14__delitem__() method, 5-24__del__() method, 6-46, 6-47deque object, collections module, 4-60Derived class, 5-11Design by contract, 7-21Destruction of objects, 6-43Dictionary, 2-13Dictionary, and class representation, 6-9Dictionary, and local function variables, 6-4Dictionary, and module namespace, 6-5
Dictionary, and object representation, 6-6Dictionary, creating from list of tuples, 2-21Dictionary, persistent, 4-46Dictionary, testing for keys, 2-20Dictionary, updating and deleting, 2-14Dictionary, use as a container, 2-19Dictionary, use as data structure, 6-3Dictionary, using as function keyword arguments, 3-40Dictionary, when to use, 2-15__dict__ attribute, of instances, 6-7, 6-8__dict__ variable, of modules, 4-14dir() function, 1-81, 1-82direct I/O, with ctypes objects', 10-22Directory listing, 4-38disable() function, gc module, 6-45divmod() function, 1-50, 1-54, 4-24__div__() method, 5-26doctest module, 3-48, 7-7, 7-8doctest module, self-testing, 7-10Documentation, 1-17Documentation strings, 3-46, 7-3Documentation strings, and help() command, 7-6Documentation strings, and help() function, 3-47Documentation strings, and IDEs, 7-5Documentation strings, and testing, 3-48, 7-7Documentation strings, vs. comments, 7-3Double precision float, 1-52Double-quoted string, 1-56Downloads, 1-2dumps() function, pickle module, 4-45Duplicating containers, 2-66
E
easy_install command, 4-71Edit,compile,debug cycle, 1-13Eggs, Python package format, 4-71ElementTree module, xml.etree package, 12-7elif statement, 1-39else statement, 1-39Embedded nulls in strings, 1-58empty code blocks, 1-42enable() function, gc module, 6-45Enabling future features, 1-51Encapsulation, 6-24Encapsulation, and accessor methods, 6-30Encapsulation, and properties, 6-31Encapsulation, challenges of, 6-25Encapsulation, uniform access principle, 6-35
end() method, of Match objects, 9-16endswith() method, strings, 1-62enumerate() function, 2-39, 2-40, 4-26environ variable, os module, 4-37Environment variables, 4-37Error reporting strategy in exceptions, 3-57Escape codes, strings, 1-57Event loop, and GUI programming, 4-54except statement, 1-79, 3-50Exception base class, 5-28Exception, defining new, 5-28Exception, printing tracebacks, 7-25Exception, uncaught, 7-24Exceptions, 1-78, 3-50Exceptions, catching, 1-79Exceptions, catching any, 3-55Exceptions, catching multiple, 3-54Exceptions, caution on use, 3-56Exceptions, finally statement, 3-58Exceptions, how to report errors, 3-57Exceptions, ignoring, 3-55Exceptions, list of built-in, 3-52Exceptions, passed value, 3-53Exceptions, propagation of, 3-51Executing system commands, 11-5, 11-8Execution model, 1-31Execution of modules, 4-5exists() function, os.path module, 4-40exit() function, sys module, 3-59Exploding heads, and exceptions, 3-56Exponential notation, 1-52Extended slicing of sequences, 2-32
F
False Value, 1-47File globbing, 4-38File system, copying and moving files, 4-42File system, getting a directory listing, 4-38File tests, 4-40File, binary, 10-9File, obtaining metadata, 4-41Files, and for statement, 1-73Files, opening, 1-72Files, reading line by line, 1-73__file__ attribute, of modules, 4-7Filtering sequence data, 2-53finally statement, 3-58find() method of strings, 9-6
find() method, strings, 1-62Finding a substring, 9-6finditer() method, of regular expressions, 9-19First class objects, 2-69Float type, 1-46, 1-52float() function, 1-55, 4-25Floating point numbers, 1-52Floating point, accuracy, 1-53Floor division operator, 1-51__floordiv__() method, 5-26For loop, and tuples, 2-41For loop, keeping a loop counter, 2-39for statement, 2-34for statement, and files, 1-73for statement, and generators, 8-17for statement, and iteration, 8-3for statement, internal operation of, 8-4for statement, iteration variable, 2-35Format codes, string formatting, 2-26Format codes, struct module, 10-12format() method of strings, 9-31Formatted output, 2-24format_exc() function, traceback module, 7-25from module import *, 4-18from statement, 4-17from __future__ import, 1-51Function, and generators, 8-17Function, running in debugger, 7-32Functions, 3-8, 3-9Functions, accepting any combination of arguments, 3-39Functions, anonymous with lambda, 3-45Functions, argument passing, 3-15Functions, benefits of using, 3-8Functions, Bottom-up style, 3-11Functions, calling with positional arguments, 3-17Functions, checking of return statement, 3-43Functions, conformance checking, 3-41Functions, default arguments, 3-16Functions, defining, 1-76Functions, definition order, 3-10Functions, design of, 3-14Functions, design of and global variables, 3-32Functions, design of and side effects, 3-26Functions, design of and transformation of inputs, 3-22Functions, design of argument names, 3-20Functions, design of input arguments, 3-21Functions, documentation strings, 3-46Functions, global variables, 3-29, 3-30
Functions, keyword arguments, 3-18Functions, local variables, 3-28Functions, mixing positional and keyword arguments, 3-19Functions, multiple return values, 3-24Functions, overloading (lack of), 3-12Functions, side effects, 3-25Functions, tuple and dictionary expansion, 3-40Functions, variable number of arguments, 3-35, 3-36, 3-37, 3-38
G
Garbage collection, and cycles, 6-44, 6-45Garbage collection, and __del__() method, 6-46, 6-47Garbage collection, reference counting, 6-43gc module, 6-45Generating text, 9-23Generator, 8-17, 8-18Generator expression, 8-24, 8-25Generator expression, efficiency of, 8-27Generator tricks presentation, 8-33Generator vs. iterator, 8-22Generator, and code reuse, 8-30Generator, and StopIteration exception, 8-20Generator, example of following a file, 8-21Generator, use of, 8-29getatime() function,, 4-41getattr() function, 5-31__getitem__() method, 5-24getmtime() function, os.path module, 4-41getoutput() function, commands module, 11-5getsize() function, os.path module, 4-41Getting a list of symbols, 1-81Getting started, 1-8glob module, 4-38global statement, 3-31Global variables, 3-27, 4-6Global variables, accessing in functions, 3-29Global variables, modifying inside a function, 3-30group() method, of Match objects, 9-16Groups, extracting text from in regular expressions, 9-18Groups, in regular expressions, 9-17GUI programming, event loop model, 4-54GUI programming, with Tkinter, 4-51Guido van Rossum, 1-3
H
hasattr() function, 5-31Hash table, 2-13Haskell, 1-5Haskell, list comprehension, 2-56has_key() method, of dictionaries, 2-20Heavy wizardry, 6-42help() command, 1-17help() command, and documentation strings, 7-6help() function, and documentation strings, 3-47hex() function, 4-25HTTP, simple web server, 12-18
I
I/O redirection, subprocess module, 11-21Identifiers, 1-33IDLE, 1-10, 1-11, 1-16IDLE, creating new program, 1-20, 1-21IDLE, on Mac or Unix, 1-12IDLE, running programs, 1-23IDLE, saving programs, 1-22IEEE 754, 1-52if statement, 1-39Ignoring an exception, 3-55Immutable objects, 1-63import statement, 1-77, 4-3import statement, creation of .pyc and .pyo files, 4-11import statement, from modifier, 4-17import statement, importing all symbols, 4-18import statement, proper use of namespaces, 4-20import statement, repeated, 4-8import statement, search path, 4-9, 4-10import statement, supported file types, 4-11import, as modifier, 4-15import, use in extensible programs, 4-16importing different versions of a library, 4-16in operator, 5-25in operator, dictionary, 2-20in operator, lists, 1-69in operator, strings, 1-60Indentation, 1-36, 1-37Indentation style, 1-38index() method, strings, 1-62Indexing of lists, 1-68Infinite data streams, 8-31Inheritance, 5-11, 5-14
Inheritance example, 5-13Inheritance, and object base, 5-16Inheritance, and polymorphism, 5-19Inheritance, and __init__() method, 5-17Inheritance, implementation of, 6-19, 6-21Inheritance, multiple, 5-20, 6-23Inheritance, multiple inheritance, 6-22Inheritance, organization of objects, 5-15Inheritance, redefining methods, 5-18Inheritance, uses of, 5-12INI files, parsing of, 4-48Initialization of objects, 5-6__init__() method, 6-41__init__() method in classes, 5-6__init__() method, and inheritance, 5-17insert() method, of lists, 1-67Inspecting modules, 1-81Inspecting objects, 1-82Instance data, 5-7Instances, and __class__ attribute, 6-10Instances, creating new, 5-5Instances, modifying after creation, 6-15Instances, representation of, 6-7, 6-8int() function, 1-55, 4-25Integer division, 1-51Integer type, 1-46Integer type, operations, 1-50Integer type, precision of, 1-48Integer type, promotion to long, 1-49Interactive mode, 1-13, 1-14, 1-15Interactive subprocesses, 11-24Interpreter execution, 11-3Interpreter prompts, 1-15__invert__() method, 5-26IronPython, 12-45is operator, 2-62, 2-64isalpha() method, strings, 1-62isdigit() method, strings, 1-62isdir() function, os.path module, 4-39, 4-40isfile() function, os.path module, 4-39, 4-40isinstance() function, 2-71islower() method, strings, 1-62Item access methods, 5-24Iterating over a sequence, 2-34Iteration, 8-3Iteration protocol, 8-4Iteration variable, for loop, 2-35Iteration, design of iterator classes, 8-14Iteration, user defined, 8-8, 8-9
Iterator vs. generator, 8-22itertools module, 8-32__iter__() method, 8-4
J
Java, and Jython, 12-41join() function, os.path module, 4-39join() method, of strings, 9-25join() method, strings, 1-62Jython, 12-41
K
key argument of sort() method, 2-48KeyboardInterrupt exception, 3-59Keys, dictionary, 2-13Keyword arguments, 3-18Keywords, 1-34, 1-35kill()function, os module, 11-15
L
lambda statement, 3-45Lazy evaluation, 8-31len() function, 2-29, 5-24len() function, lists, 1-69len() function, strings, 1-60__len__() method, 5-24Library modules, 1-77Life cycle of objects, 6-40Line continuation, 1-43Lisp, 1-5List, 2-29List comprehension, 2-52, 2-53, 2-54List comprehension uses, 2-55List comprehensions and awk, 2-57List concatenation, 1-67List processing, 2-51List replication, 1-69List type, 1-67List vs. Tuple, 2-12list() function, 2-66, 2-67List, extra memory overhead, 2-12List, Looping over items, 2-34List, sorting, 2-46List, use as a container, 2-18listdir() function, os module, 4-38
lists vs. arrays, 10-28Lists, changing elements, 1-68Lists, indexing, 1-68Lists, removing items, 1-70Lists, searching, 1-69loads() function, pickle module, 4-45Local variables, 3-27Local variables in functions, 3-28log() function, math module, 1-54Long type, 1-46, 1-49Long type, use with integers, 1-49long() function, 1-55, 4-25Looping over integers, 2-37Looping over items in a sequence, 2-34Looping over multiple sequences, 2-42lower() method, strings, 1-61, 1-62__lshift__() method, 5-26lstrip() method, of strings, 9-5
M
Main program, 4-7main() function, unittest module, 7-16__main__, 4-7__main__ module, 7-10Match objects, re module, 9-16match() method, of regular expression patterns, 9-14Matching a regular expression, 9-14math module, 1-54, 1-77, 4-33Math operators, 5-26Math operators, floating point, 1-54Math operators, integer, 1-50max() function, 2-33, 4-24Memory efficiency, and generators, 8-31Memory management, reference counting, 2-62, 2-64Memory use of tuple and list, 2-12Method invocation, 5-29Method, definition of, 5-3Methods, calling other methods in the same class, 5-9Methods, in classes, 5-8Methods, private, 6-28min() function, 2-33, 4-24Modules, 4-3modules variable, sys module, 4-8Modules, as object, 4-13Modules, dictionary, 4-14Modules, execution of, 4-5Modules, loading of, 4-8Modules, namespaces, 4-4
Modules, search path, 4-9, 4-10Modules, self-testing with doctest, 7-10__mod__() method, 5-26move() function, shutil module, 4-42__mro__ attribute, of classes, 6-23Multiple inheritance, 5-20, 6-22, 6-23__mul__() method, 5-26
N
Namespace, 4-14Namespaces, 4-4, 4-20__name__ attribute, of modules, 4-7Naming conventions, Python's reliance upon, 6-26Negative indices, lists, 1-68__neg__() method, 5-26Network programming introduction, 12-15Network programming, example of sockets, 12-16__new__() method, 6-41, 6-42next() method, and iteration, 8-4next() method, of generators, 8-19no-op statement, 1-42None type, 2-4None type, returned by functions, 3-23not operator, 1-40Null value, 2-4Numeric conversion, 1-55Numeric datatypes, 1-46numpy, 10-29
O
object base class, 5-16Object oriented programming, 5-3Object oriented programming, and encapsulation, 6-24Objects, attribute binding, 6-17, 6-20Objects, attributes of, 5-7Objects, creating containers, 5-24Objects, creating new instances, 5-5Objects, creating private attributes, 6-27Objects, creation steps, 6-41Objects, defining new, 5-4Objects, first class behavior, 2-69Objects, inheritance, 5-11Objects, invoking methods, 5-5Objects, life cycle, 6-40Objects, making a deep copy of, 2-68Objects, memory management of, 6-43
Objects, method invocation, 5-29Objects, modifying attributes of instances, 6-13Objects, modifying instances, 6-15Objects, multiple inheritance, 6-22Objects, reading attributes, 6-16Objects, representation of, 6-11Objects, representation of instances, 6-7, 6-8Objects, representation with dictionary, 6-6Objects, saving with pickle, 4-44Objects, serializing into a string, 4-45Objects, single inheritance, 6-21Objects, special methods, 5-22Objects, type checking, 2-71Objects, type of, 2-63oct() function, 4-25Old-style classes, 5-16Online help, 1-17open() function, 1-72open() function, shelve module, 4-46, 4-47Optimized mode, 7-23Optimized mode (-O), 7-22Optional features, defining function with, 3-18Optional function arguments, 3-16optparse module, 4-30or operator, 1-40ord() function, 4-25__or__() method, 5-26os module, 4-36, 11-6os.path module, 4-39, 4-41Output, print statement, 1-41Overloading, lack of with functions, 3-12
P
pack() function, struct module, 10-11Packing binary structures, 10-10, 10-11Packing values into a tuple, 2-9Parallel iteration, 2-42Parsing, 9-3pass statement, 1-42path variable, sys module, 4-9, 4-10Pattern syntax, regular expressions, 9-11pdb module, 7-27pdb module, commands, 7-29Performance of string operations, 9-8Performance statistics, profile module, 7-34Perl, difference in string handling, 1-75Perl, regular expressions and Python, 9-21Perl, string interpolation and Python, 9-28
Persistent dictionary, 4-46pexpect library, 11-24pickle module, 4-44pickle module, and strings, 4-45Pipelines, and generators, 8-29Pipes, subprocess module, 11-18poll() method, Popen objects, 11-14Polymorphism, and inheritance, 5-19Popen() function, subprocess module, 11-8popen2 module, 11-6Positional function arguments, 3-17Post-assertion, 7-21pow() function, 1-50, 1-54, 4-24Powers of numbers, 1-50__pow__() method, 5-26Pre-assertion, 7-21Primitive datatypes, 2-3print statement, 1-41, 4-32, 5-23print statement, and files, 1-72print statement, and str(), 1-64print statement, trailing comma, 1-41print, formatted output, 2-25print_exc() function, traceback module, 7-25Private attributes, 6-27Private attributes, performance of name mangling, 6-29Private methods, 6-28profile module, 7-34Profiling, 7-34Program exit, 3-59Program structure, 3-6Program structure, definition order, 3-7Propagation of exceptions, 3-51Properties, and encapsulation, 6-31py files, 1-19Python Documentation, 1-17Python eggs packages, 4-71Python influences, 1-5Python interpreter, 1-13Python interpreter, keeping alive after execution, 7-26Python interpreter, optimized mode, 7-22, 7-23Python package index, 4-67Python, extending with ctypes, 12-23Python, reason created, 1-4Python, running on command line, 1-24Python, source files, 1-19Python, starting on Mac, 1-12Python, starting on Unix, 1-12Python, starting on Windows, 1-11Python, starting the interpreter, 1-9
Python, statement execution, 1-31Python, uses of, 1-6, 1-7Python, year created, 1-3python.org website, 1-2Pythonwin extension, 12-33, 12-34
R
raise Statement, 1-80raise statement, 3-50Raising exceptions, 1-80random module, 4-34Random numbers, 4-34range() function, 2-38, 4-26range() vs. xrange(), 2-38Raw strings, 1-57Raw strings, and regular expressions, 9-12re module, 9-10re module, compile() function, 9-13re module, find all occurrences of a pattern, 9-19re module, pattern syntax, 9-11Read-eval loop, 1-14Reading attributes on objects, 6-16readline() method, files, 1-72Redefining methods with inheritance, 5-18Redefining output file, 4-32Redirecting print to a file, 1-72Reference counting, 2-62, 2-64, 2-67, 6-43Reference counting, containers, 2-66register_function() method, of SimpleXMLRPCServer, 12-21register_instance() method, of SimpleXMLRPCServer, 12-21Regular expression syntax, 9-11Regular expressions (see re), 9-10Regular expressions, and Perl, 9-21Regular expressions, compiling patterns, 9-13Regular expressions, extracting text from numbered groups, 9-18Regular expressions, Match objects, 9-16Regular expressions, matching, 9-14Regular expressions, numbered groups, 9-17Regular expressions, obtaining matching text, 9-16Regular expressions, pattern replacement, 9-20Regular expressions, searching, 9-15Relational database, interface to, 4-61Relational operators, 1-40remove() method, lists, 1-70Repeated function definitions, 3-12
Repeated imports, 4-8replace() method of strings, 9-7replace() method, strings, 1-61, 1-62Replacing a substring, 9-7Replacing text, 1-61Replication of sequences, 2-30repr() function, 4-25, 5-23Representation of strings, 1-58representing binary data, 10-6__repr__() method, 5-23Reserved names, 1-34, 1-35return statement, 3-23return statement, multiple values, 3-24returncode attribute, Popen objects, 11-13reversed() function, 4-26rfind() method, strings, 1-62rindex() method, strings, 1-62rmtree() function, shutil module, 4-42round() function, 4-24Rounding errors, floating point, 1-53__rshift__() method, 5-26rsplit() method of strings, 9-4rstrip() method, of strings, 9-5Ruby, string interpolation and Python, 9-28run() function, pdb module, 7-33runcall() function, pdb module, 7-33runeval() function, pdb module, 7-33Running Python, 1-9Runtime error vs. compile-time error, 3-42
S
safe_substitute() method, of Template objects, 9-30Sample Python program, 1-26Scope of iteration variable in loops, 2-35Scripting, 3-3Scripting language, 1-3Scripting, defined, 3-4Scripting, problem with, 3-5search() method, of regular expression patterns, 9-15Searching for a regular expression, 9-15self parameter of methods, 5-7, 5-8Sequence, 2-29Sequence sorting, 2-49Sequence, concatenation, 2-30Sequence, extended slicing, 2-32Sequence, indexing, 2-29Sequence, length, 2-29Sequence, looping over items, 2-34
Sequence, replication, 2-30Sequence, slicing, 2-31Sequence, string, 1-58Serializing objects into strings, 4-45Set theory, list comprehension, 2-56Set type, 2-22set() function, 2-22setattr() function, 5-31__setitem__() method, 5-24setUp() method, unittest module, 7-17setup.py file, third party modules, 4-70setuptools module, 4-71set_trace() function, pdb module, 7-28Shallow copy, 2-67Shell operations, 4-42shelve module, 4-46shutil module, 4-42Side effects, 3-25SimpleHTTPServer module, 12-18SimpleXMLRPCServer module, 12-20sin() function, math module, 1-54Single-quoted string, 1-56Single-step execution, 7-32Slicing operator, 2-31__slots__ attribute of classes, 6-38socket module, example of, 12-16SocketServer module, example of, 12-17sort() method of lists, 3-44sort() method, of lists, 2-46, 2-48sorted() function, 2-49, 4-26Sorting lists, 2-46Sorting with a key function, 2-48Sorting, of any sequence, 2-49Source files, 1-19Source files, and modules, 4-3Source listing, in debugger, 7-31span) method, of Match objects, 9-16Special methods, 1-82, 5-22split() method, of strings, 9-4split() method, strings, 1-62, 1-66Splitting strings, 9-4Splitting text, 1-66SQL queries , how to form, 4-63SQL queries, injection attack risk, 4-63SQL queries, value substitutions in, 4-65sqlite3 module, 4-62sqrt() function, math module, 1-54Stack trace, in debugger, 7-30Standard I/O streams, 4-32
Standard library, 4-22start() method, of Match objects, 9-16startswith() method, strings, 1-62Statements, 1-31, 3-6Status codes, subprocesses, 11-12stderr variable, sys module, 4-32stdin variable, sys module, 4-32stdout variable, sys module, 4-32StopIteration exception, 8-5StopIteration exception, and generators, 8-20str() function, 1-64, 4-25, 5-23String concatenation, performance of, 9-24String format codes, 2-26String formatting, 2-25String formatting, with dictionary, 2-27, 9-29String interpolation, 2-27, 9-28String joining, 9-25String joining vs. concatenation, 9-26String manipulation, performance of, 9-8String replacement, 9-7String searching, 9-6String splitting, 9-4String stripping, 9-5String templates, 9-30String type, 2-29StringIO objects, 9-27Strings, concatenation, 1-59Strings, conversion to, 1-64Strings, conversion to numbers, 1-55, 1-75Strings, escape codes, 1-56, 1-57Strings, immutability, 1-63Strings, indexing, 1-59Strings, length of, 1-60Strings, literals, 1-56Strings, methods, 1-61, 1-62Strings, raw strings, 1-57Strings, redirecting I/O to, 9-27Strings, replication, 1-60Strings, representation, 1-58Strings, searching for substring, 1-60Strings, slicing, 1-59Strings, splitting, 1-66Strings, triple-quoted, 1-56Strings, use of raw strings, 9-12strip() method, of strings, 9-5strip() method, strings, 1-61, 1-62Stripping characters, 1-61Stripping characters from strings, 9-5struct module, 10-11
struct module, format modifiers, 10-13struct module, packing codes, 10-12structures, creating with ctypes, 10-21sub() method of regular expression patterns, 9-20Subclass, 5-11Subprocess, 11-4subprocess module, 11-6, 11-7subprocess module, capturing output, 11-17subprocess module, capturing stderr, 11-20subprocess module, changing the working directory, 11-11subprocess module, collecting return codes, 11-13subprocess module, I/O redirection, 11-21subprocess module, interactive subprocesses, 11-24subprocess module, pipes, 11-18subprocess module, polling, 11-14subprocess module, sending/receiving data, 11-18subprocess module, setting environment variables, 11-10Subprocess status code, 11-12subprocess, killing with a signal, 11-15substitute() method, of Template objects, 9-30__sub__() method, 5-26sum() function, 2-33, 4-24Superclass, 5-11Swig, 12-31sys module, 4-27System commands, executing as subprocess, 11-8system() function, os module, 4-36, 11-5SystemExit exception, 3-59
T
tan() function, math module, 1-54TCP, server example, 12-16tearDown() method, unittest module, 7-17Template strings, 9-30TestCase class, of unittest module, 7-13Testing files, 4-40testmod() function, doctest module, 7-8Text parsing, 9-3Text replacement, 1-61Text strings, 1-58Third party modules, 4-67, 4-68Third party modules, and C/C++ code, 4-72Third party modules, eggs format, 4-71Third party modules, native installer, 4-69Third party modules, setup.py file, 4-70time module, 4-41
Tkinter module, 4-51Tkinter module, sample widgets, 4-53traceback module, 7-25Traceback, uncaught exception, 7-24Tracebacks, 1-80Triple-quoted string, 1-56True value, 1-47Truncation, integer division, 1-51Truth values, 1-40try statement, 1-79, 3-50Tuple, 2-6, 2-29Tuple vs. List, 2-12Tuple, immutability of, 2-8Tuple, packing values, 2-9Tuple, unpacking in for-loop, 2-41Tuple, unpacking values, 2-10Tuple, use of, 2-7Tuple, using as function arguments, 3-40Type checking, 2-71Type conversion, 1-75Type of objects, 2-63type() function, 2-63, 2-71typecodes, array module, 10-25Types, 1-33Types, numeric, 1-46Types, primitive, 2-3
U
Unbound method, 5-30Uniform access principle, 6-35Unit testing, 7-12unittest module, 7-12, 7-13unittest module, example of, 7-14unittest module, running tests, 7-16unittest module, setup and teardown, 7-17unpack() function, struct module, 10-11Unpacking binary structures, 10-10, 10-11Unpacking values from tuple, 2-10upper() method, strings, 1-61, 1-62urllib module, 1-77urlopen() function, urllib module, 1-77User-defined exceptions, 5-28
V
Value of exceptions, 3-53Variable assignment, 2-61, 2-65
Variable assignment, assignment of globals in function, 3-31Variable assignment, global vs. local, 3-27Variable number of function arguments, 3-35, 3-36, 3-37, 3-38Variables, and modules, 4-6Variables, examining in debugger, 7-30Variables, names of, 1-33Variables, reference counting, 2-62, 2-64Variables, scope in functions, 3-28, 3-29Variables, type of, 1-33vars() function, 9-29
W
wait() method, Popen objects, 11-13walk() function, os module, 4-36Walking a directory of files, 4-36while statement, 1-36Widgets, and Tkinter, 4-53Windows, binary files, 10-9Windows, killing a subprocess, 11-15Windows, starting Python, 1-11write() method, files, 1-72
X
XML overview, 12-5XML parsing, extracting document elements, 12-9XML parsing, extracting element text, 12-11XML parsing, getting element attributes, 12-12XML, ElementTree module, 12-7XML, parsing with ElementTree module, 12-8XML-RPC, 12-19XML-RPC, server example, 12-20xmlrpclib module, 12-20__xor__() method, 5-26xrange() function, 2-37, 4-26xrange() vs. range(), 2-38
Z
zip() function, 2-42, 2-43, 2-44, 4-26