making connections - - mscs@uicjan/mcs275/connections.pdf · making connections 1 cta tables...
TRANSCRIPT
making connections1 CTA Tables
general transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
MCS 275 Lecture 40Programming Tools and File Management
Jan Verschelde, 19 April 2017
Programming Tools (MCS 275) making connections L-40 19 April 2017 1 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 2 / 41
GTFS of our CTA
We can download the schedules of the CTA:http://www.transitchicago.com/developers/gtfs.aspx
GTFS = General Transit Feed Specificationis an open format for packaging scheduled service data.
A GTFS feed is a series of text files with data on lines separated bycommas (csv format).
Each file is a table in a relational database.
Programming Tools (MCS 275) making connections L-40 19 April 2017 3 / 41
some tables
stops.txt: stop locations for bus or trainroutes.txt: route list with unique identifierstrips.txt: information about each trip by a vehiclestop_times.txt: scheduled arrival and departure times foreach stop on each trip.
Programming Tools (MCS 275) making connections L-40 19 April 2017 4 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 5 / 41
finding a stop name
$ python3 ctastopname.pyopening CTA/stops.txt ...give a stop id : 3021skipping line 03021 has name "California & Augusta"
The script looks for the line
3021,3021,"California & Augusta",41.89939053, \-87.69688045,0,,1
Programming Tools (MCS 275) making connections L-40 19 April 2017 6 / 41
ctastopname.pyFILENAME = ’CTA/stops.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)STOPID = input(’give a stop id : ’)COUNT = 0STOPNAME = Nonewhile True:
LINE = DATAFILE.readline()if LINE == ’’:
breakL = LINE.split(’,’)try:
if int(L[0]) == STOPID:STOPNAME = L[2]break
except:print ’skipping line’, COUNT
COUNT = COUNT + 1print STOPID, ’has name’, STOPNAME
Programming Tools (MCS 275) making connections L-40 19 April 2017 7 / 41
finding head signs
Given an identification of a stop,we look for all CTA vehicles that make a stop there.
$ python3 ctastoptimes.pyopening CTA/stop_times.txt ...give a stop id : 3021skipping line 0adding "63rd Pl/Kedzie"adding "Kedzie/Van Buren"[’"63rd Pl/Kedzie"’, ’"Kedzie/Van Buren"’]
We scan the lines in stop_times.txt for where the given stopidentification occurs.
Programming Tools (MCS 275) making connections L-40 19 April 2017 8 / 41
ctastoptimes.pyFILENAME = ’CTA/stop_times.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)STOPID = input(’give a stop id : ’)COUNT = 0TIMES = []while True:
LINE = DATAFILE.readline()if LINE == ’’:
breakL = LINE.split(’,’)try:
if int(L[3]) == id:if not L[5] in TIMES:
print ’adding’, L[5]TIMES.append(L[5])
except:print ’skipping line’, COUNT
COUNT = COUNT + 1print TIMES
Programming Tools (MCS 275) making connections L-40 19 April 2017 9 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 10 / 41
finding connections
The file stop_times.txt has lines
22043803629,07:38:30,07:38:30,30085,22,"UIC",0,9845522043803629,07:40:30,07:40:30,30069,23,"UIC",0,100813
Stops 30085 ("Clinton-Blue")and 30069 ("UIC-Halsted") are connectedvia stop head sign "UIC".
In a dictionary D we store D[(30085,30069)] = "UIC".
Programming Tools (MCS 275) making connections L-40 19 April 2017 11 / 41
ctaconnections.py
The initialization and start of the loop:
FILENAME = ’CTA/stop_times.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)COUNT = 0PREV_STOP = -1PREV_HEAD = ’’D = {}while True:
LINE = DATAFILE.readline()if LINE == ’’:
breakL = LINE.split(’,’)
Programming Tools (MCS 275) making connections L-40 19 April 2017 12 / 41
ctaconnections.py
Updating the dictionary D with L:
try:(STOP, HEAD) = (int(L[3]), L[5])if PREV_STOP == -1:
(PREV_STOP, PREV_STOP) = (STOP, HEAD)else:
if PREV_HEAD == HEAD:D[(PREV_STOP, STOP)] = HEAD
else:(PREV_STOP, PREV_HEAD) = (STOP, HEAD)
except:print ’skipping line’, COUNT
COUNT = COUNT + 1print D, len(D)
Programming Tools (MCS 275) making connections L-40 19 April 2017 13 / 41
a sparse matrix
There are 11430 lines in stops.txt.Except for the first line, every line is stop.Viewing each stop as a node in a graph,there are 11429 nodes.The adjacency matrix has 11,429 rows and 11,429 colums or130,622,041 elements.The dictionary stores 583,279 elements, less than 0.5% of the totalpossible 11,429 × 11,429 elements.
Programming Tools (MCS 275) making connections L-40 19 April 2017 14 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 15 / 41
connecting the stops
$ python3 ctaconnectstops.pyopening CTA/stop_times.txt ...loading a big file, be patient ...skipping line 0573036 connectionsgive start stop id : 30085
give end stop id : 3006930085 and 30069 are connected by "UIC"
Programming Tools (MCS 275) making connections L-40 19 April 2017 16 / 41
the function stopdict
FILENAME = ’CTA/stop_times.txt’
def stopdict(name):"""Opens the file with given name.The file contains scheduled arrivaland departure times for each stopon each trip. On return is a dictionaryD with keys (i,j) and strings as values,where i and j are stop ids and thevalue is the empty string if i and jare not connected by a trip, otherwiseD[(i,j)] contains the trip name."""
Programming Tools (MCS 275) making connections L-40 19 April 2017 17 / 41
the function main()
def main():"""Creates a dictionary from the filestop_times.txt and prompts the userfor a start and end stop id.The result of the dictonary querytells whether the stops are connected."""conn = stopdict(FILENAME)print len(conn), ’connections’i = input(’give start stop id : ’)j = input(’ give end stop id : ’)outs = str(i) + ’ and ’ + str(j)if not conn.has_key((i, j)):
print outs + ’ are not connected’else:
print outs + ’ are connected by ’ + conn[(i, j)]
Programming Tools (MCS 275) making connections L-40 19 April 2017 18 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 19 / 41
sparse matrices
>>> from scipy import sparse
To store an adjacency matrix similar to D[(i,j)]we use the COOrdinate format:
>>> from scipy import array>>> from scipy.sparse import coo_matrix>>> row = array([0,3,1,0])>>> col = array([0,3,1,2])>>> data = array([4,5,7,9])>>> A = coo_matrix((data,(row,col)),shape=(4,4))>>> A.todense()matrix([[4, 0, 9, 0],
[0, 7, 0, 0],[0, 0, 0, 0],[0, 0, 0, 5]])
Programming Tools (MCS 275) making connections L-40 19 April 2017 20 / 41
SciPy session continued
>>> B = A*A>>> B.todense()matrix([[16, 0, 36, 0],
[ 0, 49, 0, 0],[ 0, 0, 0, 0],[ 0, 0, 0, 25]])
Property of adjacency matrices A: if (Ak )i ,j �= 0,then nodes i and j are connected by a path of length k .
Programming Tools (MCS 275) making connections L-40 19 April 2017 21 / 41
dictionary of keys sparse matrices
dok_matrix is a dictionary of keys based sparse matrix:
allows for efficient access of individual elements;can be efficient converted to a coo_matrix.
>>> from scipy import sparse>>> A = sparse.dok_matrix((4,4))>>> A[1,2] = 1>>> B = sparse.coo_matrix(A)>>> B.todense()matrix([[ 0., 0., 0., 0.],
[ 0., 0., 1., 0.],[ 0., 0., 0., 0.],[ 0., 0., 0., 0.]])
Programming Tools (MCS 275) making connections L-40 19 April 2017 22 / 41
session continued
>>> B.todense()matrix([[ 0., 0., 0., 0.],
[ 0., 0., 1., 0.],[ 0., 0., 0., 0.],[ 0., 0., 0., 0.]])
>>> B.rowarray([1], dtype=int32)>>> B.colarray([2], dtype=int32)>>> B.dataarray([ 1.])>>> B.nnz1
The attributes row, col, data, and nnz respectively return the row,column indices, the corresponding data, and the number of nonzeros.
Programming Tools (MCS 275) making connections L-40 19 April 2017 23 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 24 / 41
the script spy_matrixplot.py
import numpy as npfrom matplotlib.pyplot import spyimport matplotlib.pyplot as pltfrom scipy import sparse
r = 0.1 # ratio of nonzeroesn = 100 # dimension of the matrixA = np.random.rand(n,n)A = np.matrix(A < r,int)S = sparse.coo_matrix(A)x = S.row; y = S.colfig = plt.figure()ax = fig.add_subplot(111)ax.plot(x,y,’.’)plt.show()
Programming Tools (MCS 275) making connections L-40 19 April 2017 26 / 41
the matrix plot for the CTA
Programming Tools (MCS 275) making connections L-40 19 April 2017 27 / 41
the script ctamatrixplot.py
# L-40 MCS 275 Wed 20 Apr 2016 : ctamatrixplot.py
# This script creates a sparse matrix A,# which is the adjacency matrix of the stops:# A[i,j] = 1 if stops i and j are connected.
from scipy import sparseimport matplotlib.pyplot as plt
filename = ’CTA/stop_times.txt’print ’opening’, filename, ’...’file = open(filename,’r’)
n = 12165A = sparse.dok_matrix((n,n))
Programming Tools (MCS 275) making connections L-40 19 April 2017 28 / 41
the script continued
i = 0; prev_id = -1; prev_hd = ’’while True:
d = file.readline()if d == ’’: breakL = d.split(’,’)try:
id = int(L[3]); hd = L[5]if prev_id == -1:
(prev_id, prev_hd) = (id, hd)else:
if prev_hd == hd:A[prev_id, id] = 1
else:(prev_id, prev_hd) = (id, hd)
except:pass # print ’skipping line’, i
i = i + 1
Programming Tools (MCS 275) making connections L-40 19 April 2017 29 / 41
making the plot
B = sparse.coo_matrix(A)x = B.row; y = B.colfig = plt.figure()ax = fig.add_subplot(111)ax.set_xlim(-1,n)ax.set_ylim(-1,n)ax.plot(x,y,’b.’)plt.show()
Programming Tools (MCS 275) making connections L-40 19 April 2017 30 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 31 / 41
adjacency matrix
An adjacency matrix A is a matrix of zeroes and ones:
A[row][column] = 1: row and column are connected,A[row][column] = 0: row and column are not connected.
For example:
1 0 1 0 00 1 1 0 10 0 0 0 01 0 1 0 10 1 1 1 0
Programming Tools (MCS 275) making connections L-40 19 April 2017 32 / 41
a random adjacency matrix
from random import randint
def random_adjacencies(dim):"""Returns D, a dictionary of dictionaries torepresent a square matrix of dimension dim.D[row][column] is a random bit."""result = {}for row in range(dim):
result[row] = {}for column in range(dim):
result[row][column] = randint(0, 1)return result
Programming Tools (MCS 275) making connections L-40 19 April 2017 33 / 41
writing the matrix
def write(dim, mat):"""Writes the square matrix of dimension dimrepresented by the dictionary mat."""for row in range(dim):
for column in range(dim):print(’ %d’ % mat[row][column], end=’’)
print(’’)
Programming Tools (MCS 275) making connections L-40 19 April 2017 34 / 41
making connections
1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary
2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix
3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix
Programming Tools (MCS 275) making connections L-40 19 April 2017 35 / 41
searching the adjacency matrix
Consider again the example:
1 0 1 0 00 1 1 0 10 0 0 0 01 0 1 0 10 1 1 1 0
Observe:There is no direct path from 1 to 3.We can go from 1 to 4 and from 4 to 3.
Programming Tools (MCS 275) making connections L-40 19 April 2017 36 / 41
matrix-matrix multiplication
>>> import numpy as np>>> A = np.matrix([[1, 0, 1, 0, 0],... [0, 1, 1, 0, 1],... [0, 0, 0, 0, 0],... [1, 0, 1, 0, 1],... [0, 1, 1, 1, 0]])>>> A*Amatrix([[1, 0, 1, 0, 0],
[0, 2, 2, 1, 1],[0, 0, 0, 0, 0],[1, 1, 2, 1, 0],[1, 1, 2, 0, 2]])
>>> _[1, 3]1
A2i ,j = 1: there is a path from i to j with one intermediate stop.
Programming Tools (MCS 275) making connections L-40 19 April 2017 37 / 41
the main program
def main():"""Prompts the user for the dimensionans shows a random adjacency matrix."""dim = int(input(’Give the dimension : ’))mtx = random_adjacencies(dim)write(dim, mtx)src = int(input(’Give the source : ’))dst = int(input(’Give the destination : ’))mxt = int(input(’Give the maximum number of steps : ’))pth = search(dim, mtx, dst, 0, mxt, [src])print(’the path :’, pth)
Programming Tools (MCS 275) making connections L-40 19 April 2017 38 / 41
the specfication and base case
def search(dim, mat, destination, level, maxsteps, \accu):"""Searchs the matrix mat of dimension dimfor a path between source and destination withno more than maxsteps intermediate stops.The path is accumulated in accu,initialized with source."""source = accu[-1]if mat[source][destination] == 1:
return accu + [destination]else:
...
Programming Tools (MCS 275) making connections L-40 19 April 2017 39 / 41
the rest of the definition
if level < maxsteps:for k in range(dim):
if k not in accu:if mat[source][k] == 1:
path = search(dim, mat, destination, \level+1, maxsteps, accu + [k])
if path[-1] == destination:return path
return accu
Programming Tools (MCS 275) making connections L-40 19 April 2017 40 / 41
Summary + Exercises
Dictionaries are good to process data on file.
1 Modify ctastopname.py so the user is prompted for a stringinstead of a number. The modified script prints all id’s andcorresponding names that have the given string as substring.
2 Instead of using numpy and scipy,use turtle to draw the spy plot of a matrix.
3 Instead of using numpy and scipy,use the canvas widget of tkinter to draw the spy plot of a matrix.
4 Apply the search to work on the adjacency matrix of the dataobtained for the CTA.
Programming Tools (MCS 275) making connections L-40 19 April 2017 41 / 41