ecpr methods summer school: automated collection of web...
TRANSCRIPT
![Page 1: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/1.jpg)
ECPR Methods Summer School:Automated Collection of Web and Social Data
Pablo Barber
´
a
London School of Economicspablobarbera.com
Course website:
pablobarbera.com/ECPR-SC104
![Page 2: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/2.jpg)
![Page 3: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/3.jpg)
![Page 4: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/4.jpg)
![Page 5: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/5.jpg)
![Page 6: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/6.jpg)
![Page 7: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/7.jpg)
How can we collect web and social data toanswer social science questions?
![Page 8: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/8.jpg)
Course outline
1&2 Scraping data from the webI Key tools for webscrapingI Scraping tablesI Scraping web data in unstructured formatI Parsing RSS feeds
3 Working with APIsI How to build an http requestI Interacting with newspapers’ APIs
4 Collecting social media dataI Twitter’s Streaming APII Twitter’s REST API
5 Advanced topicsI Parsing data in PDF formatI Text encoding
![Page 9: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/9.jpg)
Hello!
![Page 10: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/10.jpg)
About me: Pablo Barbera
I Assistant Professor of Computational Social Science at theLondon School of Economics
I Previously Assistant Prof. at Univ. of Southern CaliforniaI PhD in Politics, New York University (2015)I Data Science Fellow at NYU, 2015–2016
I My research:I Social media and politics, comparative electoral behaviorI Text as data methods, social network analysis, Bayesian
statisticsI Author of R packages to analyze data from social media
I Contact:[email protected]
Iwww.pablobarbera.com
I@p barbera
![Page 11: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/11.jpg)
About me: Tom Paskhalis
I PhD candidate in Social Research Methods at the LondonSchool of Economics
I My research:I Interest groups and political partiesI Text as data, record linkage, Bayesian statisticsI Author/contributor to R packages to scrape websites and
PDF documents
I Contact:[email protected]
Itom.paskhal.is
I@tpaskhalis
![Page 12: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/12.jpg)
About me: Alberto StefanelliI Prospective Phd candidate at KU Leuven
I Previously Master Student at Central European UniversityI Vice president of the Populism Research Group at Central
European University and member of the survey andexperimental teams of Team Populism
I External Consultant and data analyst for the ECPRMethods Schools and the Intellectual Theme Initiativeproject Text Analysis across Disciplines
I My research:I Electoral behavior, public opinion, political communication,
party financeI Graphical causal models, machine learning, text analysis,
and big data
I Contact:[email protected]
Ialberto-stefanelli.netlify.com
I@sergsagara
![Page 13: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/13.jpg)
Your turn!1. Name?2. Affiliation?3. Research interests?4. Previous experience with R?5. Why are you interested in this
course?
![Page 14: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/14.jpg)
Course philosophy
How to learn the techniques in this course?I Lecture approach: not ideal for learning how to codeI You can only learn by doing.! We will cover each concept three times during each
session1. Introduction to the topic (20-30 minutes)2. Guided coding session (30-40 minutes)3. Coding challenges (30 minutes)
I You’re encouraged to continue working on the codingchallenges after class. Solutions will be posted thefollowing day.
I Additional questions? We can arrange one-on-onemeetings after class
![Page 15: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/15.jpg)
Course logistics
ECTS credits:I Attendance: 2 credits (pass/fail grade)I Submission of at least 3 coding challenges: +1 credit
I Due before beginning of following class via email to Tom orAlberto
I Only applies to challenge 2 of the dayI Graded on a 100-point scale
I Submission of class project: +1 creditI Due by August 20thI Goal: collect and analyze data from the web or social mediaI 5 pages max (including code) in Rmarkdown formatI Graded on a 100-point scale
If you wish to obtain more than 2 credits, please indicate so inthe attendance sheet
![Page 16: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/16.jpg)
Social event
Save the date:Wednesday Aug. 1st, 6pm
Location TBA
![Page 17: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/17.jpg)
Why we’re using R
I Becoming lingua franca of statistical analysis in academiaI What employers in private sector demandI It’s free and open-sourceI Flexible and extensible through packages (over 10,000 and
counting!)I Powerful tool to conduct automated text analysis, social
network analysis, and data visualization, with packagessuch as quanteda, igraph or ggplot2.
I Command-line interface and scripts favors reproducibility.I Excellent documentation and online help resources.
R is also a full programming language; once you understandhow to use it, you can learn other languages too.
![Page 18: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/18.jpg)
RStudio Server
![Page 19: ECPR Methods Summer School: Automated Collection of Web ...pablobarbera.com/ECPR-SC104/slides/01-slides-intro.pdf · I Powerful tool to conduct automated text analysis, social network](https://reader036.vdocuments.net/reader036/viewer/2022081400/5f0ba1d27e708231d4317593/html5/thumbnails/19.jpg)
Course website
pablobarbera.com/ECPR-SC104