guest lecture at coding culture, utrecht

Post on 15-Jan-2015

161 Views

Category:

Education

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

A talk I gave at Coding Culture, an initiative of graduate students from the "New Media and Digital Culture" track at Utrecht University.

TRANSCRIPT

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python in the Social SciencesA brief introduction by means of real-life

examples

Damian Trilling

d.c.trilling@uva.nl@damian0604

www.damiantrilling.net

Afdeling CommunicatiewetenschapUniversiteit van Amsterdam

Coding Culture, Utrecht, 5 March 2014

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

What I won’t do today

I won’t give you a structured introduction about

• variables• commands• data types• . . .

and all the other technical stuff.

You’ll do that yourself the next weeks.I’ll give you some examples of what you can do with the knowledgeyou’re going to acquire.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

What I won’t do today

I won’t give you a structured introduction about

• variables• commands• data types• . . .

and all the other technical stuff.

You’ll do that yourself the next weeks.

I’ll give you some examples of what you can do with the knowledgeyou’re going to acquire.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

What I won’t do today

I won’t give you a structured introduction about

• variables• commands• data types• . . .

and all the other technical stuff.

You’ll do that yourself the next weeks.I’ll give you some examples of what you can do with the knowledgeyou’re going to acquire.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

Why should I learn Python?

Some examples

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

A recent bachelor thesis

Tone in tweets

Imagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

A recent bachelor thesis

Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.

Do you really want to go through thousands of tweets by hand?

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

A recent bachelor thesis

Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponents

The student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.

She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.

For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koninkrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

Frame adoption on Twitter

Which phrases used by Merkel and Steinbrück on TV make itto the #tvduell discussion on Twitter?As part of the project, I wrote a Python-script to identify wordco-occurrences on Twitter. The script produced not only lists withword counts, but also a GDF-file that could be used forvisualization.

Python in the Social Sciences Damian Trilling

1 #!/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.72 # -*- coding: utf-8 -*-3 from __future__ import division4 from itertools import combinations5 from collections import defaultdict6 from collections import Counter7 from unicsv import CsvUnicodeReader8 import codecs, cStringIO, sys, re, unicodedata, os9

10 gdfbestand="resultaten/netwerk.gdf"11 wordsplitbestand="resultaten/wordsplit.csv"12 tempbestand="allewoorden.tmp"1314 minedgeweight=2015 cooc=defaultdict(int)16 tweets=[]1718 print "\nReading tweet nr. "19 reader=CsvUnicodeReader(open(wordsplitbestand,"r"))20 i=021 for row in reader:22 i=i+123 # skip first row, as it contains column headers24 if i>1:25 print "\r",str(i)," ",26 sys.stdout.flush()27 tweets.append(row[9])

1 f = codecs.open(tempbestand, ’wb’, encoding="utf-8")2 i=03 print "Making tempfile to count word frequencies"4 allestems=[]5 for tweet in tweets:6 for stems in tweet.split():7 allestems.append(stems)8 for k in range(0,len(allestems)):9 f.write(allestems[k]+"\n")

10 print "Couting..."11 c=Counter()12 with codecs.open(tempbestand,"rb", encoding="utf-8") as r:13 for l in r:14 c[l.rstrip(’\n’)] += 115 os.remove(tempbestand)16 f = codecs.open(gdfbestand, ’wb’, encoding="utf-8")17 for tweet in tweets:18 words=tweet.split()19 for a,b in combinations(words,2):20 if a!=b:21 cooc[(a,b)]+=1

1 f.write("nodedef>name VARCHAR, width DOUBLE\n")2 algenoemd=[]3 verwijderen=[]4 for k in cooc:5 if cooc[k]<minedgeweight:6 verwijderen.append(k)7 else:8 if k[0] not in algenoemd:9 f.write(k[0]+","+str(c[k[0]])+"\n")

10 algenoemd.append(k[0])11 if k[1] not in algenoemd:12 f.write(k[1]+","+str(c[k[1]])+"\n")13 algenoemd.append(k[1])14 for k in verwijderen:15 del cooc[k]16 f.write("edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLE\n")17 for k, v in cooc.iteritems():18 regel= ",".join(k)+","+str(v)19 f.write(regel+"\n")

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Some exmples

Frame adoption on Twitter

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Why should I learn Python?

Summing up what you can use it for

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

One tool to rule them all?

Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the

possibility to build exactly the tool we need. And it’sfun!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

One tool to rule them all?

Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the

possibility to build exactly the tool we need.

And it’sfun!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

One tool to rule them all?

Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the

possibility to build exactly the tool we need. And it’sfun!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

1st group of tasks

Highly repetitive tasksSimple tasks (counting things, comparing texts, . . . ) that can bedescribed in a formalized way. Saves time even with few cases, butthere is virtually no size limit.

Example: Retweets start with RT, optionally followed by a space,and some letters. So it is very easy to identify them automatically

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

2nd group of tasks

Task for which specific Python modules existThere are thousands of modules suitable for text analysis. Youbasically only have to write code for data input and output.

Example: Sentiment analysis

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

3rd group of tasks

API’s, RSS, webscraping . . .You can use Python if you want to collect and store information.

Example: Collecting bio’s of Twitter users, scraping the web (datajournalism!), downloading Facebook data

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Why we should use Python in the social sciences

It is a programming language

• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess

• You can run it on every platform

• And yet it is easy to learn!

It is widely used for content analysis

• Many online ressources and toolkits• Books about NLP and Web Scraping with Python

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Why we should use Python in the social sciences

It is a programming language

• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess

• You can run it on every platform• And yet it is easy to learn!

It is widely used for content analysis

• Many online ressources and toolkits• Books about NLP and Web Scraping with Python

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Why we should use Python in the social sciences

It is a programming language

• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess

• You can run it on every platform• And yet it is easy to learn!

It is widely used for content analysis

• Many online ressources and toolkits• Books about NLP and Web Scraping with Python

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and

a column per actor, counting how often they are mentioned3 A typical task for a short Python script!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles

2 The desired output: You want a table with the file names anda column per actor, counting how often they are mentioned

3 A typical task for a short Python script!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and

a column per actor, counting how often they are mentioned

3 A typical task for a short Python script!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and

a column per actor, counting how often they are mentioned3 A typical task for a short Python script!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

You need someting like this:

for every file in folder:read the filecount actorsadd new row to table with filename and actor counts

save table

(such a notation is called pseudo-code)

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Summing up what you can use it for

and in Python, it’s not that different!

Python in the Social Sciences Damian Trilling

1 mypath ="C:\Users\Ricarda\Documents\Artikelen"2 regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)3 filename_list=[]4 matchcount54=05 matchcount54_list=[]6 onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]7 for f in onlyfiles:8 matchcount54=09 artikel=open(join(mypath,f),"r")

10 for line in artikel:11 matches54 = regex54.findall(line)12 for word in matches54:13 matchcount54=matchcount54+114 filename_list.append(f)15 matchcount54_list.append(matchcount54)16 artikel.close()17 output=zip(filename_list,matchcount54_list)18 writer = csv.writer(open("overzichtstabel.csv", ’wb’))19 writer.writerows(output)

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Pseudo-code

Explaining a basic Python script:Pseudo code

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Pseudo-code

We collected tweets on the UNFCC-conference withyourTwapperkeeper.

Our task: Identify all tweets that include a reference to PolandLet’s start with some pseudo-code!

1 open csv-table2 for each line:3 append column 1 to a list of tweets4 append column 3 to a list of corresponding users5 look for searchstring in column 16 append search result to a list of results7 save lists to a new csv-file

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python code

Explaining a basic Python script:Python code

Python in the Social Sciences Damian Trilling

1 #!/usr/bin/python2 from unicsv import CsvUnicodeReader3 from unicsv import CsvUnicodeWriter4 import re5 inputfilename="mytweets.csv"6 outputfilename="myoutput.csv"7 user_list=[]8 tweet_list=[]9 search_list=[]

10 searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)11 print "Opening "+inputfilename12 reader=CsvUnicodeReader(open(inputfilename,"r"))13 for row in reader:14 tweet_list.append(row[0])15 user_list.append(row[2])16 matches1 = searchstring1.findall(row[0])17 matchcount1=018 for word in matches1:19 matchcount1=matchcount1+120 search_list.append(matchcount1)21 print "Constructing data matrix"22 outputdata=zip(tweet_list,user_list,search_list)23 headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])24 print "Write data matrix to ",outputfilename25 writer=CsvUnicodeWriter(open(outputfilename,"wb"))26 writer.writerows(headers)27 writer.writerows(outputdata)

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python code

1 #!/usr/bin/python2 # We start with importing some modules:3 from unicsv import CsvUnicodeReader4 from unicsv import CsvUnicodeWriter5 import re67 # Let us define two variables that contain8 # the names of the files we want to use9 inputfilename="mytweets.csv"

10 outputfilename="myoutput.csv"

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python code

1 # We create some empty lists that we will use later on.2 # A list can contain several variables3 # and is denoted by square brackets.4 user_list=[]5 tweet_list=[]6 search_list=[]

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python code

1 # What do we want to look for?2 searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)34 # Enough preparation, let the program begin!5 # We tell the user what is going on...6 print "Opening "+inputfilename78 # ... and call the module that reads the input file.9 reader=CsvUnicodeReader(open(inputfilename,"r"))

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python code

1 # Now we read the file line by line.2 # The indented block is repeated for each row3 # (thus, each tweet)4 for row in reader:5 # append data from the current row to our lists.6 # Note that we start counting with 0.7 tweet_list.append(row[0])8 user_list.append(row[2])9

10 # Let us count how often our searchstring is used in11 # in this tweet12 matches1 = searchstring1.findall(row[0])13 matchcount1=014 for word in matches1:15 matchcount1=matchcount1+116 search_list.append(matchcount1)

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Python code

1 # Time to put all the data in one container2 # and save it:34 print "Constructing data matrix"5 outputdata=zip(tweet_list,user_list,search_list)6 headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])7 print "Write data matrix to ",outputfilename8 writer=CsvUnicodeWriter(open(outputfilename,"wb"))9 writer.writerows(headers)

10 writer.writerows(outputdata)

Python in the Social Sciences Damian Trilling

1 #!/usr/bin/python2 from unicsv import CsvUnicodeReader3 from unicsv import CsvUnicodeWriter4 import re5 inputfilename="mytweets.csv"6 outputfilename="myoutput.csv"7 user_list=[]8 tweet_list=[]9 search_list=[]

10 searchstring1 = re.compile(r’[Pp]olen|[Pp]ool|[Ww]arschau|[Ww]arszawa’)11 print "Opening "+inputfilename12 reader=CsvUnicodeReader(open(inputfilename,"r"))13 for row in reader:14 tweet_list.append(row[0])15 user_list.append(row[2])16 matches1 = searchstring1.findall(row[0])17 matchcount1=018 for word in matches1:19 matchcount1=matchcount1+120 search_list.append(matchcount1)21 print "Constructing data matrix"22 outputdata=zip(tweet_list,user_list,search_list)23 headers=zip(["tweet"],["user"],["how often is Poland mentioned?"])24 print "Write data matrix to ",outputfilename25 writer=CsvUnicodeWriter(open(outputfilename,"wb"))26 writer.writerows(headers)27 writer.writerows(outputdata)

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

The output

Explaining a basic Python script:The output (myoutput.csv)

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

The output

1 tweet,user,how often is Poland mentioned?2 :-) #Lectrr #wereldleiders #uitspraken #Wikileaks #klimaattop http://t.

co/Udjpk48EIB,henklbr,03 Wat zijn de resulaten vd #klimaattop in #Warschau waard? @EP_Environment

ontmoet voorzitter klimaattop @MarcinKorolec http://t.co/4Lmiaopf60,Europarl_NL,1

4 RT @greenami1: De winnaars en verliezers van de lachwekkende #klimaattopin #Warschau (interview): http://t.co/DEYqnqXHdy #Misserfolg #Kli

...,LarsMoratis,15 De winnaars en verliezers van de lachwekkende #klimaattop in #Warschau (

interview): http://t.co/DEYqnqXHdy #Misserfolg #Klimaschutz #FAZ,greenami1,1

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

The output

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Try it yourself!

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Doe je mee?

Python in the Social Sciences Damian Trilling

Why should I learn Python? Why should I learn Python? Explaining a basic Python script Your turn Questions?

Vragen of opmerkingen?

Damian Trilling

d.c.trilling@uva.nl@damian0604

www.damiantrilling.net

Python in the Social Sciences Damian Trilling

top related