15. political discourseinthenewskb
TRANSCRIPT
![Page 1: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/1.jpg)
Political Discourse in the News
(and other studies)
Antske Fokkens, VU University AmsterdamPolitical discourse in the News is joint work with: Ellis Aizenberg, Wouter van Atteveldt, Carlotta
Cassimassima, Franz-Xaver Geiger, Laura Hollink, Annick van der Peet, Chantal van Son
![Page 2: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/2.jpg)
Overview• Introduction
• Interdisciplinary research & research questions
• Text analysis
• From basic to complex: possibilities and challenges
• Methodological issues
• Conclusion
![Page 3: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/3.jpg)
Introduction• Interdisciplinary research:
• Social Science: manual annotation, research questions
• Humanities: research questions
• Computer Science: modeling, visualization
• Computational Linguistics: text analysis
![Page 4: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/4.jpg)
Introduction• Research questions:
• Has personalization increased in political news?
• What trends do we see in reported political conflicts?
• How does news reporting relate to the parliamentary debates?
• What perspectives are expressed by news (explicitly and implicitly)?
![Page 5: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/5.jpg)
Approaches• Manual annotations:
• Expert (communication science researchers and Master students)
• Crowd (crowdsourcing)
• Automatic annotation:
• Basic as well as advanced NLP approaches
![Page 6: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/6.jpg)
Text analysis• AmCAT (Wouter van Atteveldt):
• Open source infrastructure that facilitates large-scale analysis and manual content analysis of text
• BiographyNet/NewsReader pipeline (Piek Vossen’s cltl group):
• NLP modules for event (and event relation) extraction & named entity recognition and disambiguation
• OpeNER tools (Piek Vossen’s cltl group):
• Sentiment analysis and opinion mining
![Page 7: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/7.jpg)
Basic methods
• Counting:
• occurrences of names in text
• identifying words from word lists (e.g. sentiment words)
• Topic modeling (e.g. LDA)
![Page 8: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/8.jpg)
Basic methods• Can easily be run on large datasets
• Can address research questions (e.g. Aizenberg (2014) shows increase of personalization)
• Limited to overall trends and tendencies
• For some tasks, high risk of unreliable results:
• e.g. erg is listed with ‘negative sentiment’
![Page 9: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/9.jpg)
More advanced analyses
![Page 10: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/10.jpg)
More advanced analyses• Can provide more detailed insight into the content of the text
• Scalability becomes an issue (several complex language models)
• to illustrate:
• +/- 5 minutes per article (regular university cluster)
• 11 days for 1.3 million articles on Hadoop cluster at SURFsara
• Accuracy can be low for difficult tasks and because errors ‘pile up’
![Page 11: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/11.jpg)
Methodological issues
• Data interpretation
• Biases
• Example: OCR
![Page 12: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/12.jpg)
Data interpretation
• Basic methods:
• results from counts are clear, but what do they say?
• More advanced methods:
• attempt to provide semantic interpretations, but what is the accuracy of the tools?
![Page 13: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/13.jpg)
Biases• One way to deal with errors is to assume that it is just noise in
a large pile of data
• This assumption works, if errors are equally distributed across classes/information that matter for the research question
• For instance, counting sentiment related terms:
• are the lists for negative and positive terms of comparable quality?
• does one of the list contain more ambiguous terms than the other?
![Page 14: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/14.jpg)
Bias example OCR
• Data from the KB still have some issues with OCR
• There tend to be more issues with older data
• Imagine we investigate whether emotional expressions in text increased over time: Does worse OCR lead to a lower percentage of identification in older text?
![Page 15: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/15.jpg)
Dealing with biases• We cannot exclude the risk of biases completely
• We can:
• try to make sure researchers using output are aware of the details of the method (raise awareness of possible biases)
• carry out both intrinsic and extrinsic evaluation, i.e. explicitly investigate the influence of a bias on overall results
![Page 16: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/16.jpg)
Conclusion• Several research directions where technology (including NLP, linked
data, visualizations) is used to support research in Humanities and Social Sciences
• NLP approaches vary from basic to complex pipelines carrying out several steps
• Basic approaches can easily be applied to large datasets are transparent, but do not say much
• More advanced approaches provide detailed information, but cannot easily be applied to large sets and are less transparent
• Insight into how data was processed and both intrinsic and extrinsic evaluation is needed to raise awareness about (or even avoid?) biases
![Page 17: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/17.jpg)
Thank you!
![Page 18: 15. political discourseinthenewskb](https://reader035.vdocuments.net/reader035/viewer/2022071710/55a9d32f1a28ab6a1b8b47da/html5/thumbnails/18.jpg)
References• AmCAT: http://vanatteveldt.com/amcat/
• BiographyNet/NewsReader pipeline:
• Rodrigo Agerri et al. (2015). Event Detection version 2.2. NewsReader Deliverable 4.2.2. http://www.newsreader-project.eu/files/2012/12/NWR-D4-2-2.pdf
• Methodological issues:
• Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek Vossen, Susan Legêne and Guus Schreiber. 2014. BiographyNet: Methodological issues when NLP supports historical research. Proceedings of LREC 2014.http://www.lrec-conf.org/proceedings/lrec2014/pdf/1103_Paper.pdf
• Niels Ockeloen, Antske Fokkens, Serge ter Braake, Piek Vossen, Victor de Boer, Guus Schreiber, and Susan Legêne. 2013. BiographyNet: Managing Provenance at multiple levels and from different perspectives. Proceedings of Linked Science 2013. http://linkedscience.org/wp-content/uploads/2013/04/paper7.pdf