csc1015f – chapter 5, strings and input
DESCRIPTION
CSC1015F – Chapter 5, Strings and Input. Michelle Kuttel [email protected]. The String Data Type. Used for operating on textual information Think of a string as a sequence of characters To create string literals, enclose them in single, double, or triple quotes as follows: - PowerPoint PPT PresentationTRANSCRIPT
The String Data Type
Used for operating on textual information Think of a string as a sequence of
characters
To create string literals, enclose them in single, double, or triple quotes as follows: a = "Hello World" b = 'Python is groovy' c = """Computer says 'Noooo'"""
2
Comments and docstrings It is common practice for the first statement of
function to be a documentation string describing its usage. For example:
def hello:
“””Hello World function”””
print(“Hello”)
print(“I love CSC1015F”)
This is called a “docstring” and can be printed thus:print(hello.__doc__)
3
Comments and docstrings Try printing the doc string for functions you
have been using, e.g.:
print(input.__doc__)
print(eval.__doc__)
4
Checkpoint Str1: Strings and loops. What does the following function do?def oneAtATime(word): for c in word: print("give us a '",c,"' ... ",c,"!", sep='') print("What do you have? -",word)
5
Checkpoint Str1a: Indexing examples does this function do?def str1a(word):
for i in word:
if i in "aeiou":
continue
print(i,end='')
6
0 1 2 3 4 5 6 7 8
H e l l o B o b
Some BUILT IN String functions/methodss.capitalize() Capitalizes the first character. s.count(sub) Count the number of occurences of sub
in ss.isalnum() Checks whether all characters are
alphanumeric. s.isalpha() Checks whether all characters are
alphabetic. s.isdigit() Checks whether all characters are digits.s.islower() Checks whether all characters are low-
ercase. s.isspace() Checks whether all characters are
whitespace.
7
Some BUILT IN String functions/methodss.istitle() Checks whether the string is a
title- cased string (first letter of each word capitalized).
s.isupper() Checks whether all characters are uppercase.
s.join(t) Joins the strings in sequence t with s as a separator.
s.lower() Converts to lowercase. s.lstrip([chrs]) Removes leading
whitespace or characters supplied in chrs. s.upper() Converts a string to uppercase.
8
Some BUILT IN String functions/methodss.replace(oldsub,newsub) Replace all
occurrences of oldsub in s with newsub
s.find(sub) Find the first occurrence of sub in s
9
BUILT IN String functions/methods
Try printing the doc string for str functions:
print(str.isdigit.__doc__)
10
The String Data TypeAs string is a sequence of characters, we can
access individual characters called indexing
form:<string>[<expr>]
The last character in a string of n characters has index n-1
11
0 1 2 3 4 5 6 7 8
H e l l o B o b
String functions: len len tells you how many characters there are
in a string:
len(“Jabberwocky”)
len(“Twas brillig and the slithy toves did gyre and gimble in the wabe”)
12
Checkpoint Str2: Indexing examplesWhat does this function do?
def str2(word):
for i in range(0,len(word),2):
print(word[i],end='')
13
0 1 2 3 4 5 6 7 8
H e l l o B o b
More Indexing examples - indexing from the endWhat is the output of these lines?greet =“Hello Bob”
greet[-1]
greet[-2]
greet[-3]
14
0 1 2 3 4 5 6 7 8
H e l l o B o b
Checkpoint Str3What is the output of these lines?def str3(word):
for i in range(len(word)-1,-1,-1):
print(word[i],end='')
15
0 1 2 3 4 5 6 7 8
H e l l o B o b
Chopping strings into pieces: slicingThe previous examples can be done much more
simply:
slicing indexes a range – returns a substring, starting at the first position and running up to, but not including, the last position.
16
Examples - slicingWhat is the output of these lines?greet =“Hello Bob”
greet[0:3]
greet[5:9]
greet[:5]
greet[5:]
greet[:]
17
0 1 2 3 4 5 6 7 8
H e l l o B o b
Checkpoint Str4: Strings and loops. What does the following function do?def sTree(word): for i in range(len(word)): print(word[0:i+1])
18
Checkpoint Str5: Strings and loops. What does the following code output?def sTree2(word):
step=len(word)//3
for i in range(step,step*3+1,step):
for j in range(i):
print(word[0:j+1])
print("**\n**\n")
sTree2(“strawberries”)
19
More info on slicing The slicing operator may be given an optional
stride, s[i:j:stride], that causes the slice to skip elements. Then, i is the starting index; j is the ending index; and
the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included).
The stride may also be negative. If the starting index is omitted, it is set to the
beginning of the sequence if stride is positive or the end of the sequence if stride is negative.
If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative.
20
More on slicing Here are some examples with strides:
a = "Jabberwocky”b = a[::2] # b = 'Jbewcy'c = a[::-2] # c = 'ycwebJ'd = a[0:5:2] # d = 'Jbe'e = a[5:0:-2] # e = 'rba'f = a[:5:1] # f = 'Jabbe'g = a[:5:-1] # g = 'ykcow'h = a[5::1] # h = 'rwocky'i = a[5::-1] # i = 'rebbaJ'j = a[5:0:-1] # 'rebba'
21
Checkpoint Str6: stridesWhat is the output of these lines?greet =“Hello Bob”
greet[8:5:-1]
22
0 1 2 3 4 5 6 7 8
H e l l o B o b
Checkpoint Str7: Slicing with stridesHow would you do this function in one line with no
loops?
def str2(word):
for i in range(0,len(word),2):
print(word[i],end='')
23
0 1 2 3 4 5 6 7 8
H e l l o B o b
Checkpoint Str8: What does this code display?
#checkpointStr8.py
def crunch(s):
m=len(s)//2
print(s[0],s[m],s[-1],sep='+')
crunch("omelette")
crunch("bug")
24
Example: filters Pirate, Elmer Fudd, Swedish Cheff produce parodies of English speech
How would you write one in Python?
25
Example: Genetic Algorithms (GA’s) GA’s attempt to mimic the process of natural
evolution in a population of individuals use the principles of selection and evolution to
produce several solutions to a given problem. biologically-derived techniques such as inheritance,
mutation, natural selection, and recombination a computer simulation in which a population
of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions.
over time, those genetic changes which enhance the viability of an organism tend to predominate
Bioinformatics Example: Crossover (recombination)
Evolution works at the chromosome level through the reproductive process portions of the genetic information of each parent are
combined to generate the chromosomes of the offspring
this is called crossover
Crossover MethodsSingle-Point Crossover
randomly-located cut is made at the pth bit of each parent and crossover occurs
produces 2 different offspring
Gene splicing example (for genetic algorithms) We can now do a cross-over!
Crossover3.py
29
Example: palindrome program
palindrome |ˈpalɪndrəʊm|nouna word, phrase, or sequence that reads the same backward as forward,
e.g., madam or nurses run
In Python, write a program to check whether a word is a palindrome.
You don’t need to use loops…
30
String representation and message encoding On the computer hardware, strings are also
represented as zeros and ones. Computers represent characters as numeric
codes, a unique code for each digit. an entire string is stored by translating each
character to its equivalent code and then storing the whole thing as as a sequence of binary numbers in computer memory
There used to be a number of different codes for storing characters which caused serious headaches!
31
ASCII (American Standard Code for Information Interchange) An important character encoding standard
are used to represent numbers found on a typical (American) computer keyboard as well as some special control codes used for sending and recieveing information
A-Z uses values in range 65-90 a-z uses values in range 97-122
in use for a long time: developed for teletypes
American-centric Extended ASCII codes have been developed
32
33
Unicode A character set that includes all the ASCII
characters plus many more exotic characters http://www.unicode.org
34
Python supports Unicode standard
ord returns numeric code
of a character chr
returns character corresponding to a code Unicodes for Cuneiform
Characters in memory Smallest addressable piece of memory is
usually 8 bits, or a byte how many characters can be represented by a
byte?
35
Characters in memory Smallest addressable piece of memory is
usually 8 bits, or a byte how many characters can be represented by a
byte? 256 different values (28) is this enough?
36
Characters in memory Smallest addressable piece of memory is
usually 8 bits, or a byte 256 different values is enough for ASCII (only a 7
bit code) but not enough for UNICODE, with 100 000+
possible characters UNICODE uses different schemes for packing
UNICODE characters into sequences of bytes UTF-8 most common
uses a single byte for ASCIIup to 4 bytes for more exotic characters
37
Comparing strings conditions may compare numbers or
strings when strings are compared, the order is lexographic
strings are put into order based on their Unicode values
e.g “Bbbb” < “bbbb”“B” <”a”
38
The min function…min(iterable[, key=func]) -> valuemin(a, b, c, ...[, key=func]) -> value
With a single iterable argument, return its smallest item.
With two or more arguments, return the smallest argument.
39
Checkpoint: What do these statements evaluate as?
min(“hello”)
min(“983456”)
min(“Peanut”)
40
Example 2: DNA Reverse Complement Algorithm
A DNA molecule consists of two strands of nucleotides. Each nucleotide is one of the four molecules adenine, guanine, thymine, or cytosine. Adenine always pairs with
guanine and thymine always pairs with cytosine.
A pair of matched nucleotides is called a base pair
Task: write a Python program to calculate the reverse complement of any DNA strand
41
Scrabble letter scores Different languages
should have different scores for the letters how do you work this
out? what is the algorithm?
42
Related Example: Calculating character (base) frequency DNA has the alphabet ACGT
BaseFrequency.py
43
Why would you want to do this? You can calculate the
melting temperature of DNA from the base pair percentage in a DNA References:
Breslauer et al. Proc. Natl. Acad. Sci. USA 83, 3746-3750
Baldino et al. Methods in Enzymol. 168, 761-777).
44
Input/Output as string manipulation eval evaluates a string as a Python expression.
Very general and can be used to turn strings into nearly any other Python data type
The “Swiss army knife” of string conversion eval("3+4")
Can also use Python numeric type conversion functions: int(“4”) float(“4”)
But string must be a numeric literal of the appropriate form, or will get an error
Can also convert numbers to strings with str function
45
String formatting with formatThe built-in s.format() method is used to
perform string formatting. The {} are slots show where the values will
go. You can “name” the values, or access them
by their position (counting from zero).
>>> a = "Your name is {0} and your age is {age}"
>>> a.format("Mike", age=40) 'Your name is Mike and your age is 40'
46
Example 4: Better output for Calculating character (base) frequency BaseFrequency2.py
47
More on formatYou can add an optional format specifier to each
placeholder using a colon (:) to specify column widths, decimal places, and alignment.
general format is: [[fill[align]][sign][0][width] [.precision][type]
where each part enclosed in [] is optional. The width specifier specifies the minimum field
width to use the align specifier is one of '<', '>’, or '^' for left,
right, and centered alignment within the field. An optional fill character fill is used to pad the
space
48
More on formatFor example:name = "Elwood"
r = "{0:<10}".format(name) # r = 'Elwood '
r = "{0:>10}".format(name) # r = ' Elwood'
r = "{0:^10}".format(name) # r = ' Elwood '
r = "{0:=^10}".format(name) # r = '==Elwood==‘
49
format: type specifier indicates the type of data.
50
More on format The precision part supplies the number of digits of
accuracy to use for decimals. If a leading '0' is added to the field width for numbers, numeric values are padded with leading 0s to fill the space.
x = 42
r = '{0:10d}'.format(x) # r = ' 42'
r = '{0:10x}'.format(x) # r = ' 2a'
r = '{0:10b}'.format(x) # r = ' 101010'
r = '{0:010b}'.format(x) # r = '0000101010'
y = 3.1415926
r = '{0:10.2f}'.format(y) # r = ' 3.14’
r = '{0:10.2e}'.format(y) # r = ' 3.14e+00'
r = '{0:+10.2f}'.format(y) # r = ' +3.14'
r = '{0:+010.2f}'.format(y) # r = '+000003.14'
r = '{0:+10.2%}'.format(y) # r = ' +314.16%'
51
Example: FormatEg.py
52
Checkpoint: Write down the exact output for the following codetxt="{name}-{0}*{y}+{1}”
print(txt.format("cat","dog",name=”hat",y="rat"))
print(txt.format(1,0,name=2,y=3))
print(txt.format(2,3))
53
Format to improve formatting BaseFrequency2.py
restuarant2.py
54