lesson 7: how to use help, list and dictionary methodsnaraehan/ling1901/lesson7.pdflist operations...
TRANSCRIPT
Lesson 7: How to Use Help, List
and Dictionary Methods
Fundamentals of Text Processing for Linguists
Na-Rae Han
Objectives
Learning on your own
dir(), help()
Python IDLE tooltips
Using online references
List methods
Dictionary methods
2/19/2014 2
Teaching yourself new tricks
2/19/2014 3
Python built-in helper functions
dir()
help()
Python IDLE tool tips
Online references
Python 2.7 Quick Reference:
http://rgruet.free.fr/PQR27/PQR2.7.html
>>> range(
dir() and help()
2/19/2014 4
dir(obj) Returns a list of
attributes (__xyz__) and methods that are available
for the given object.
>>> dir(str) ['__add__', '__class__', '__contains__', '__delattr__', ... '__subclasshook__', '_formatter_field_name_split', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
dir() and help()
2/19/2014 5
>>> dir(str) ['__add__', '__class__', '__contains__', '__delattr__', ... '__subclasshook__', '_formatter_field_name_split', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> help(str.find) Help on method_descriptor: find(...) S.find(sub [,start [,end]]) -> int Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure.
help(obj.method) prints out information on
the object's method
Self-learn!
2/19/2014 6
Using the various sources, find out what the following string methods do:
5 minutes
>>> dir(str) ['__add__', '__class__', '__contains__', '__delattr__', ... '__subclasshook__', '_formatter_field_name_split', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
Try help(str.strip)
Additional string operations (1)
2/19/2014 7
.isalpha() returns True only if all characters
are alphabetic
.isalnum() returns True only if all characters
are a digit or an alphabet
.isdigit() returns True only if all characters
are a digit
.isspace() returns True only if all characters
are a whitespace character
>>> 'co-operate'.isalpha() False >>> 'Exercise2'.isalnum() True >>> '2013'.isdigit() True >>> ' \n\t'.isspace() True
>>> ' green ideas \n'.strip() 'green ideas'
>>> 'green ideas'.find('e') 2 >>> 'green ideas'.find('ea') 8 >>> 'green ideas'.find('t') -1 >>> 'green ideas'.count('e') 3 >>> 'green ideas sleep'.count('ee') 2 >>> 'The thirty-three thieves thought that'.count('th') 5
Additional string operations (2)
2/19/2014 8
.strip() returns a string stripped of whitespaces on either edge
.find() searches for the given string
within str, and returns the first index where it begins.
Returns -1 if not found.
.count() searches for the given string and
returns the total count
List operations
2/19/2014 9
List methods
Functions that are defined on the list datatype
Called on a list object, has this syntax:
listobj.method()
Lists are mutable, which means list methods modify the caller object (list) in place.
>>> li = [8, 'abc', 4.5, 11]
>>> li[2]
4.5
>>> li[2] = 1000
>>> li
[8, 'abc', 1000, 11]
Lists are mutable
2/19/2014 10
We can change individual list elements
These elements are changed in place: the rest of the list is not affected
The list name 'li' still points to the same memory reference when we're done.
Because lists are mutable, they are not as fast as tuples.
Tuples are immutable
2/19/2014 11
You can't change a tuple.
Instead, what you should do is make a fresh new tuple and reassign the name:
>>> tu = ('Spring', 'Summer', 'Fall', 'Winter') >>> tu[2] 'Fall' >>> tu[2] = 'Autumn' Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> tu[2] = 'Autumn' TypeError: 'tuple' object does not support item
assignment
Adding to a list
2/19/2014 12
>>> li = [1,2,3]
>>> li
[1, 2, 3]
>>> li.append(4)
>>> li
[1, 2, 3, 4]
>>> li.extend([5,6,7])
>>> li
[1, 2, 3, 4, 5, 6, 7]
>>> li.insert(1, 1.5)
>>> li
[1, 1.5, 2, 3, 4, 5, 6, 7]
Try dir(li) and help()! ??
??
??
>>> li = [3, 9, 'ab', 3.5]
>>> li.append('a')
>>> li
[3, 9, 'ab', 3.5, 'a']
>>> li.extend([9, 11, 'c'])
>>> li
[3, 9, 'ab', 3.5, 'a', 9, 11, 'c']
>>> li.insert(2, 'x')
>>> li
[3, 9, 'x', 'ab', 3.5, 'a', 9, 11, 'c']
List methods
2/19/2014 13
.append(x) adds a single item
at the end
.extend(list) adds a list of items
at the end
.insert(i,x) inserts an item
at index i
>>> li = [1, 2]
>>> li.append(3)
>>> li
[1, 2, 3]
>>> li.extend([4,5])
>>> li
[1, 2, 3, 4, 5]
>>> li.append([6,7])
>>> li
[1, 2, 3, 4, 5, [6, 7]]
>>> len(li)
6
.append() vs. .extend()
2/19/2014 14
List inside a list! [6,7] is appended
as a single element. li has length 6
.extend() vs. +
2/19/2014 15
+ also extends a list, but it creates and returns
a NEW list. li is NOT affected.
>>> li = [1, 2, 3]
>>> li.extend([4, 5, 6])
>>> li
[1, 2, 3, 4, 5, 6]
>>> li + [7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> li
[1, 2, 3, 4, 5, 6]
.extend() vs. +
2/19/2014 16
+ also extends a list, but it creates and returns
a NEW list. li is NOT affected.
>>> li = [1, 2, 3]
>>> li.extend([4, 5, 6])
>>> li
[1, 2, 3, 4, 5, 6]
>>> li + [7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> li
[1, 2, 3, 4, 5, 6]
>>> li = li + [7, 8, 9]
>>> li
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> li += [10, 11]
>>> li
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
To extend li itself, reassign it to the
new, returned list.
List methods based on item value
2/19/2014 17
.index(x) index of first occurrence
.count(x) number of occurrences
.remove(x) remove first occurrence only
>>> li = ['a', 'b', 'c', 'b']
>>> li.index('b')
1
>>> li.count('b')
2
>>> li.remove('b')
>>> li
['a', 'c', 'b']
>>> li = ['a', 'b', 'c', 'b']
>>> li.index('b')
1
>>> li.count('b')
2
>>> li.remove('b')
>>> li
['a', 'c', 'b']
List methods based on item value
2/19/2014 18
Careful – These throw an error
if 'b' is not found in the list
Use in conjunction with the in operator:
if 'b' in li : li.remove('b')
.pop()
2/19/2014 19
>>> li = ['a', 'b', 'c', 'd', 'e']
>>> li.pop()
'e'
>>> li
['a', 'b', 'c', 'd']
>>> li.pop(2)
'c'
>>> li
['a', 'b', 'd']
removes the last item from the list
and returns it
removes the item at index
and returns it
.pop()
2/19/2014 20
>>> li = ['a', 'b', 'c', 'd', 'e']
>>> li.pop()
'e'
>>> li
['a', 'b', 'c', 'd']
>>> li.pop(2)
'c'
>>> li
['a', 'b', 'd']
.pop() removes an item
from the list; list no longer
contains the item
.pop()
2/19/2014 21
>>> li = ['a', 'b', 'c', 'd', 'e']
>>> li.pop()
'e'
>>> li
['a', 'b', 'c', 'd']
>>> li.pop(2)
'c'
>>> li
['a', 'b', 'd']
Because the popped item 'c' is returned, you can assign a name to it, e.g.,
x = li.pop(2) x's value is 'c'
.pop() vs. .append()
2/19/2014 22
>>> li = ['a', 'b', 'c']
>>> li.append('x')
>>> li
['a', 'b', 'c', 'x']
>>> li.pop()
'x'
>>> li
['a', 'b', 'c']
.append('x')
& .pop()
undo each other
.pop() vs. .insert()
2/19/2014 23
>>> li = ['a', 'b', 'c']
>>> li.insert(2, 'x')
>>> li
['a', 'b', 'x', 'c']
>>> li.pop(2)
'x'
>>> li
['a', 'b', 'c']
.insert(i,'x') &
.pop(i) undo each other
Practice
2/19/2014
add 'thou' to the list
change 'i' to "I'
add 'we' and 'they'
remove 'thou'
add pron2 to pron
add 'yinz' between 'we' and 'they'
Try it yourself
2/19/2014
2 minutes
dict: a dictionary data type
2/19/2014 26
A dictionary of the Simpson family members' age
A dictionary of verb past tense
Dictionaries store a mapping between a set of keys and a set of values.
Keys can be any immutable type: string, integer, tuple
Values can be any type (can also be mixed types)
There is no inherent order (unlike lists and tuples)
You can define, modify, view, lookup, and delete the key-value pairs in the dictionary.
{'Homer':36, 'Marge':36, 'Bart':10, 'Lisa':8, 'Maggie':1}
{'go':'went', 'eat':'ate', 'see':'saw', 'say':'said'}
Looking up a dictionary
2/19/2014 27
>>> en2es = {'cat':'gato', 'dog':'perro', 'tiger':'tigre'}
>>> en2es['cat']
'gato'
>>> en2es['dog']
'perro'
>>> en2es['gato']
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
en2es['gato']
KeyError: 'gato'
Dictionary is one way. Cannot look up based on
the value. Mapping can be many-to-
one.
Adding and deleting an entry
2/19/2014 28
>>> en2es = {'cat':'gato', 'dog':'perro'}
>>> en2es['cat']
'gato'
>>> en2es['tiger'] = 'tigre'
>>> en2es
{'tiger': 'tigre', 'dog': 'perro', 'cat': 'gato'}
>>> del en2es['dog']
>>> en2es
{'tiger': 'tigre', 'cat': 'gato'}
There is no order in dictionary!
del deletes a key and its value from a
dictionary
creates a new key & maps value
Checking if something's in a dictionary
2/19/2014 29
>>> en2es
{'tiger': 'tigre', 'dog': 'perro', 'cat': 'gato'}
>>> 'fox' in en2es
False
>>> 'cat' in en2es
True
>>> 'gato' in en2es
False
"in" does not work with value
"in" tests if a key is in a dictionary
Finding out what's in
2/19/2014 30
>>> en2es
{'tiger': 'tigre', 'wolf': 'lobo', 'cat': 'gato'}
>>> en2es.keys()
['tiger', 'wolf', 'cat']
>>> en2es.values()
['tigre', 'lobo', 'gato']
>>> en2es.items()
[('tiger', 'tigre'), ('wolf', 'lobo'), ('cat', 'gato')]
.keys() returns a list of keys, .values() returns a list of values.
The orders match!
A list of key, value TUPLES ('pairs')
Iterating through dictionary
2/19/2014 31
>>> en2es
{'tiger': 'tigre', 'wolf': 'lobo', 'cat': 'gato'}
>>> 'tiger' in en2es
True
>>> for k in en2es :
print k, 'is', en2es[k]
tiger is tigre
wolf is lobo
cat is gato
en2es.keys() also works
"in" tests if a key is in a dictionary
Iterating through key:value tuples
2/19/2014 32
>>> en2es
{'tiger': 'tigre', 'wolf': 'lobo', 'cat': 'gato'}
>>> en2es.items()
[('tiger', 'tigre'), ('wolf', 'lobo'), ('cat', 'gato')]
>>> for (k,v) in en2es.items() :
print k, 'is', v
tiger is tigre
wolf is lobo
cat is gato
Common dict application: counting
2/19/2014 33
>>> tally = {'gold':1, 'bronze':3}
>>> tally['bronze']
3
>>> medals = ['bronze', 'silver', 'gold', 'gold', 'silver']
>>> for m in medals:
tally[m] += 1
Traceback (most recent call last):
File "<pyshell#80>", line 2, in <module>
tally[m] += 1
KeyError: 'silver'
Error: 'silver' is not yet in the
dictionary as a key
Common dict application: counting
2/19/2014 34
>>> tally = {'gold':1, 'bronze':3}
>>> tally['bronze']
3
>>> medals = ['bronze', 'silver', 'gold', 'gold', 'silver']
>>> for m in medals:
if m not in tally :
tally[m] = 1
else :
tally[m] += 1
>>> tally
{'bronze': 4, 'gold': 3, 'silver': 2}
Make sure to account for the
initial key creation & assignment
Word count
2/19/2014 35
sent = 'Rose is a rose is a rose is a rose.' words = sent.split() counts = {} for w in words : if w in counts : counts[w] += 1 else : counts[w] = 1 print counts
>>> {'a': 3, 'Rose': 1, 'is': 3, 'rose.': 1, 'rose': 2} >>>
Fold case We really must start tokenizing
punctuation.
Word count and tokenization
2/19/2014 36
sent = 'Rose is a rose is a rose is a rose.' words = sent.lower().replace('.',' .').split() counts = {} for w in words : if w in counts : counts[w] += 1 else : counts[w] = 1 print counts
>>> {'a': 3, 'rose': 4, 'is': 3, '.': 1} >>>
'.' is its own word. 4 tokens of 'rose'
Word count and tokenization
2/19/2014 37
sent = 'Rose is a rose is a rose is a rose.' words = sent.lower().replace('.',' .').split() counts = {} for w in words : if w in counts : counts[w] += 1 else : counts[w] = 1 print counts
>>> {'a': 3, 'rose': 4, 'is': 3, '.': 1} >>>
'.' is its own word. 4 tokens of 'rose'
2 minutes
Wrap-up
2/19/2014 38
Next class
Sorting
File IO: reading from and writing to a file
Exercise5
Due Tuesday midnight
Not yet online – will be up tonight