announcements all groups have been assigned homework: by this evening email everyone in your group...

44
Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project 4 will be released tomorrow You will have roughly 3 weeks to work on it

Upload: charlotte-brooks

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

AnnouncementsAll groups have been assigned

Homework:By this evening email everyone in your group and

set up a meeting time to discuss project 4

Project 4 will be released tomorrowYou will have roughly 3 weeks to work on it

Page 2: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

How do I work in a team?Communication

Teams that do not communicate well do poorly on the project

Understanding the assignmentTeams that sit down and go over the assignment

together do well

Battle planOutline the project in your own English text

Code togetherDifficult parts of the project are best done together

Page 3: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Parsing TextThe vast majority of the information present on

the internet is in text formData, webpages, etc

We want to transform the data into a more usable form Examples we have seen thus far:

Encoding of a matrixEncoding of a treeProject 3, changing text (encrypting and decrypting)

Page 4: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Example: Finding a nucleotide sequence

We can find DNA sequences of parasites on the internet (typically in databases)

Problem: we want to know if a sequence of nucleotides is in a particular parasiteWe not only want to know “yes” or “no” but which

parasite

Page 5: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

What the data looks like>Schisto unique AA825099

gcttagatgtcagattgagcacgatgatcgattgaccgtgagatcgacga

gatgcgcagatcgagatctgcatacagatgatgaccatagtgtacg

>Schisto unique mancons0736

ttctcgctcacactagaagcaagacaatttacactattattattattatt

accattattattattattattactattattattattattactattattta

ctacgtcgctttttcactccctttattctcaaattgtgtatccttccttt

Page 6: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

How are we going to do it?First, we get the sequences in a big string.

Next, we find where the small subsequence is in the big string.

From there, we need to work backwards until we find “>” which is the beginning of the line with the sequence name.

From there, we need to work forwards to the end of the line. From “>” to the end of the line is the name of the sequence Yes, this is hard to get right.

Page 7: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Lets Review Some Pythonstring.find(sub) – returns the lowest index where

the substring sub is found or -1

string.find(sub, start) – same as above, except using the slice string[start:]

string.find(sub, start, end) – same as above, except using the slice string[start:end]

Page 8: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Lets Review Some Pythonstring.rfind(sub) – returns the highest index

where the substring sub is found or -1

string.rfind(sub, start) – same as above, except using the slice string[start:]

string.rfind(sub, start, end) – same as above, except using the slice string[start:end]

Page 9: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Clicker Question: are these programs

equivalent?

String.find(“two”) String.rfind(“two”)

21

A: yes

B: no

String = “two plus two is four”

Page 10: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Lets solve the problem!

Page 11: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

def findSequence(seq):

sequencesFile = "parasites.txt”

file = open(sequencesFile,”r")

sequences = file.read()

file.close()

seqloc = sequences.find(seq)

if seqloc != -1:

# Now, find the ">" with the name of the sequence

nameloc = sequences.rfind(">",0,seqloc) # using rfind() here!!

endline = sequences.find("\n",nameloc)

print ("Found in ",sequences[nameloc:endline])

else:

print ("Not found”)

Page 12: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Why -1?If .find or .rfind don’t find something, they

return -1If they return 0 or more, then it’s the index of

where the search string is found.

Note: last week we saw the urlib moduleIt contains a method that lets you download a file

from the internetHow might you modify your program to first

download the file from the internet prior to opening it?

Page 13: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Running the program

>>> findSequence("tagatgtcagattgagcacgatgatcgattgacc")

Found in >Schisto unique AA825099

>>> findSequence("agtcactgtctggttgaaagtgaatgcttccaccgatt")

Found in >Schisto unique mancons0736

Page 14: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

One More Note on ParsingWe saw how to read a file as a string or list of

strings

We saw how to leverage how data was structured to find specific information we were interested in

What if there are many pieces we want to extract?

Page 15: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Revisiting SplitString.split(delimiter) break the string String into

parts, separated by the delimiterprint (“a b c d”.split(“ “))

Would print: [‘a’, ‘b’, ‘c’, ‘d’]

• Some quirky cases for string.split()• Explained in pre lab 10

Page 16: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Why is this useful?When reading in a file, we may have many

interesting data items on a given line (or in the file)

Example: Lab 10

Page 17: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

How to glue everything together

Step 1) get some interesting data

Step 2) open the file

Step 3) read the data from the file, either as one large string or a list of strings

Step 4) break this string (or list of strings) into the data we want (rfind, find, split)

Page 18: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Abstract ExampleGetting values from a text file

str = file.read()

Lines = str.split(‘\n’) list of strings

for element in Lines: items = element.split(‘ ‘) list of strings

Page 19: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Concrete Examplefoo = "bab cad eag”

elem = foo.split(" ”)

for i in elem:

print(i.split("a"))

['b', 'b']

['c', 'd']

['e', 'g']

Page 20: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

CQ:How can I parse all the words in a file?

Assume we have read the file in as one big string (we used file.read()) and the file contains no punctuation

A) first split on “\n” and for each element in the result, we split on “ “

B) only split on “ “

Page 21: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Concrete Clicker Examplefile = open(“text.txt”, “r”)

content = file.read()

line = content.split(“\n”)

for i in line:

print(i.split(“ "))

[‘This', ‘is']

[’a’, ‘file’]

This isa file

text.txt

Page 22: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Example: Get the temperature

The weather is always available on the Internet.

Can we write a function that takes the current temperature out of a source like

http://www.ajc.com/weather or

http://www.weather.com?

Page 23: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

The Internet is mostly textWeb pages are actually text in the format called

HTML (HyperText Markup Language)HTML isn’t a programming language,

it’s an encoding language. It defines a set of meanings for certain characters,

but one can’t program in it.

We can ignore the HTML meanings for now, and just look at patterns in the text.

Page 24: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Where’s the temperature?The word

“temperature” doesn’t really show up.

But the temperature always follows the word “Currently”, and always comes before the “<b>&deg;</b>”

<td ><img

src="/shared-local/weather/images/ps.gif" width="48" height="48" border="0"><font size=-2><br></font><font

size="-1" face="Arial, Helvetica, sans-serif"><b>Currently</b><br>

Partly sunny<br>

<font size="+2">54<b>&deg;</b></font><font face="Arial, Helvetica, sans-serif" size="+1">F</font></font></td>

</tr>

Page 25: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

We can use the same algorithm we’ve seen previously

Grab the content out of a file in a big string.We’ve saved the HTML page previously.We‘ve seen how to grab it directly.

Find the starting indicator (“Currently”)

Find the ending indicator (“<b>&deg;”)

Read the previous characters

Page 26: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

def findTemperature():

weatherFile = "ajc-weather.html”

file = open(weatherFile,”r")

weather = file.read()

file.close()

# Find the Temperature

curloc = weather.find("Currently")

if curloc <> -1:

# Now, find the "<b>&deg;" following the temp

temploc = weather.find("<b>&deg;",curloc)

tempstart = weather.rfind(">",0,temploc)

print ("Current temperature:”,weather[tempstart+1:temploc])

if curloc == -1:

print (”Can't find the temp”)

Page 27: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

HomeworkEmail your group members

Read through the project 4 description when it becomes available

Page 28: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Announcements

Page 29: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Dictionaries in PythonUseful Analogy: an actual Dictionary!

English dictionaries provide an association between a Word and a DefinitionWe us the Word to look up the DefinitionGiven a definition it would be very hard to look up

the word

Page 30: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Dictionaries PythonMuch like a dictionary for the English language,

python dictionaries create an association between a key and a valueKey corresponds to a Word in our analogyValue corresponds to a Definition

Page 31: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Dictionary SyntaxA dictionary is a collection of elements

Each element is a key/value

key : value

Just like a list is defined by [ ] a dictionary is defined by { }{‘key1’:value1, ‘key2’:value2, ‘key3’:value3}

Page 32: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

KeysA key can be any immutable type (we will

consider two types)Strings and Integers

Much like the [index] is used to select out an element from a list, for a dictionary we use [key]A = {‘key1’:value1, ‘key2’:value2, ‘key3’:value3}

print(A[‘key2’])

Page 33: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Example: Simple Phone Book

phoneBook = {‘Luke’ : ’123 4567’,

‘Dr. Martino’ : ‘456 7890’}

names are keys, phone numbers are values

def lookup(key):

return phoneBook[key]

lookup(‘Dr. Martino’)

Page 34: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Clicker Question: are these programs

equivalent?

A = [‘mike’, ‘mary’, ‘marty’]print A[1]

A = {0:’mike’, 1:’mary’, 2:’marty’}print A[1]

21

A: yes

B: no

Page 35: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Clicker Question: are these programs

equivalent?

A = [‘mike’, ‘mary’, ‘marty’]print A[1]

A = {1:’mary’, 2:’marty’, 0:’mike’}print A[1]

21

A: yes

B: no

Page 36: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Key Differences from ListsLists are ordered

Index is implicit based on the list ordering

Dictionaries are unorderedKeys are specified and do not depend on order

Lists are useful for storing ordered data, dictionaries are useful for storing relational dataMotivating example from book: databases!

Page 37: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Updating a DictionaryMuch like a list we can assign to a dictionary

Abstract:

dictionary[key] = newValue

Concrete Example:

A = {0:’mike’, 1:’mary’, 2:’marty’}print A[1]A[1] = ‘alex’print A[1]

Page 38: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Adding to a DictionaryMuch like a list we can append to a dictionary

Abstract:

dictionary[newKey] = newValue

Concrete Example:

A = {0:’mike’, 1:’mary’, 2:’marty’}print A[1]A[3] = ‘alex’print A {0:’mike’, 1:’mary’, 2:’marty’, 3:’alex’}

Page 39: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Clicker Question: What is the output of this code?

A = {0:’mike’, 1:’mary’, 2:’marty’, ‘marty’:2, ‘mike’:0, ‘mary’:1}A[3] = ‘mary’A[‘mary’] = 5A[2] = A[0] + A[1]

A: {'mike': 0, 'marty': 2, 3: 'mary', 'mary': 5, 2: 'mikemary', 1: 'mary', 0: 'mike'}

B: {'mike': 0, 'marty': 2, 'mary’:3, 'mary': 5, 2: 'mikemary', 1: 'mary', 0: 'mike'}

C: {'mike': 0, 'marty': 2, 'mary’:3, 'mary': 5, 2:1, 1: 'mary', 0: 'mike'}

Page 40: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Printing a Dictionary

A = {0:'mike', 1:'mary', 2:'marty’}for k,v in A.iteritems(): print k, ":", vPrints: 2 : marty 1 : mary 0 : mike

A = {0:'mike', 1:'mary', 2:'marty’}for k in A: print kPrints: 2 1 0

Page 41: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Project 4: Frequency Analysis

IntuitionWe can leverage a dictionary to calculate the

number of times a particular letter occurs in a message

We can use characters as the keys

The number of times that character occurs is the value

Increment the value each time we see a character Initially the value starts at 0

Page 42: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Some Additional Notation:Pairs in Python

We can create pairs in pythonExample: tuple = (‘name’, 3)Such pairs are called tuples (see page 291)

Tuples support the [] for selecting their elements

Tuples are immutable (like strings)

Further reading (section 5.3):http://docs.python.org/tutorial/

datastructures.html#tuples-and-sequences

Page 43: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

TuplesWe can think of tuples as an immutable list

They do not support assignment

Example:A = (‘me’, 5, 32, ‘joe’)

print A[0]

print A[3]

A[2] = 4 <--- this throws an error

Page 44: Announcements All groups have been assigned Homework: By this evening email everyone in your group and set up a meeting time to discuss project 4 Project

Creating a dictionary from a list

Python provides the dict function to create a dictionary out of a list of pairsExample: dict([(0, ‘mike’),(1, ‘mary’),(2, ‘marty’)])

Why do I care?We can leverage list creation short cuts to

populate dictionaries!

Example: dict([(x, x**2) for x in range(10)])