15. political discourseinthenewskb

18
Political Discourse in the News (and other studies) Antske Fokkens, VU University Amsterdam Political discourse in the News is joint work with: Ellis Aizenberg, Wouter van Atteveldt, Carlotta Cassimassima, Franz-Xaver Geiger, Laura Hollink, Annick van der Peet, Chantal van Son

Upload: ingeangevaare

Post on 18-Jul-2015

58 views

Category:

Government & Nonprofit


0 download

TRANSCRIPT

Page 1: 15. political discourseinthenewskb

Political Discourse in the News

(and other studies)

Antske Fokkens, VU University AmsterdamPolitical discourse in the News is joint work with: Ellis Aizenberg, Wouter van Atteveldt, Carlotta

Cassimassima, Franz-Xaver Geiger, Laura Hollink, Annick van der Peet, Chantal van Son

Page 2: 15. political discourseinthenewskb

Overview• Introduction

• Interdisciplinary research & research questions

• Text analysis

• From basic to complex: possibilities and challenges

• Methodological issues

• Conclusion

Page 3: 15. political discourseinthenewskb

Introduction• Interdisciplinary research:

• Social Science: manual annotation, research questions

• Humanities: research questions

• Computer Science: modeling, visualization

• Computational Linguistics: text analysis

Page 4: 15. political discourseinthenewskb

Introduction• Research questions:

• Has personalization increased in political news?

• What trends do we see in reported political conflicts?

• How does news reporting relate to the parliamentary debates?

• What perspectives are expressed by news (explicitly and implicitly)?

Page 5: 15. political discourseinthenewskb

Approaches• Manual annotations:

• Expert (communication science researchers and Master students)

• Crowd (crowdsourcing)

• Automatic annotation:

• Basic as well as advanced NLP approaches

Page 6: 15. political discourseinthenewskb

Text analysis• AmCAT (Wouter van Atteveldt):

• Open source infrastructure that facilitates large-scale analysis and manual content analysis of text

• BiographyNet/NewsReader pipeline (Piek Vossen’s cltl group):

• NLP modules for event (and event relation) extraction & named entity recognition and disambiguation

• OpeNER tools (Piek Vossen’s cltl group):

• Sentiment analysis and opinion mining

Page 7: 15. political discourseinthenewskb

Basic methods

• Counting:

• occurrences of names in text

• identifying words from word lists (e.g. sentiment words)

• Topic modeling (e.g. LDA)

Page 8: 15. political discourseinthenewskb

Basic methods• Can easily be run on large datasets

• Can address research questions (e.g. Aizenberg (2014) shows increase of personalization)

• Limited to overall trends and tendencies

• For some tasks, high risk of unreliable results:

• e.g. erg is listed with ‘negative sentiment’

Page 9: 15. political discourseinthenewskb

More advanced analyses

Page 10: 15. political discourseinthenewskb

More advanced analyses• Can provide more detailed insight into the content of the text

• Scalability becomes an issue (several complex language models)

• to illustrate:

• +/- 5 minutes per article (regular university cluster)

• 11 days for 1.3 million articles on Hadoop cluster at SURFsara

• Accuracy can be low for difficult tasks and because errors ‘pile up’

Page 11: 15. political discourseinthenewskb

Methodological issues

• Data interpretation

• Biases

• Example: OCR

Page 12: 15. political discourseinthenewskb

Data interpretation

• Basic methods:

• results from counts are clear, but what do they say?

• More advanced methods:

• attempt to provide semantic interpretations, but what is the accuracy of the tools?

Page 13: 15. political discourseinthenewskb

Biases• One way to deal with errors is to assume that it is just noise in

a large pile of data

• This assumption works, if errors are equally distributed across classes/information that matter for the research question

• For instance, counting sentiment related terms:

• are the lists for negative and positive terms of comparable quality?

• does one of the list contain more ambiguous terms than the other?

Page 14: 15. political discourseinthenewskb

Bias example OCR

• Data from the KB still have some issues with OCR

• There tend to be more issues with older data

• Imagine we investigate whether emotional expressions in text increased over time: Does worse OCR lead to a lower percentage of identification in older text?

Page 15: 15. political discourseinthenewskb

Dealing with biases• We cannot exclude the risk of biases completely

• We can:

• try to make sure researchers using output are aware of the details of the method (raise awareness of possible biases)

• carry out both intrinsic and extrinsic evaluation, i.e. explicitly investigate the influence of a bias on overall results

Page 16: 15. political discourseinthenewskb

Conclusion• Several research directions where technology (including NLP, linked

data, visualizations) is used to support research in Humanities and Social Sciences

• NLP approaches vary from basic to complex pipelines carrying out several steps

• Basic approaches can easily be applied to large datasets are transparent, but do not say much

• More advanced approaches provide detailed information, but cannot easily be applied to large sets and are less transparent

• Insight into how data was processed and both intrinsic and extrinsic evaluation is needed to raise awareness about (or even avoid?) biases

Page 17: 15. political discourseinthenewskb

Thank you!

Page 18: 15. political discourseinthenewskb

References• AmCAT: http://vanatteveldt.com/amcat/

• BiographyNet/NewsReader pipeline:

• Rodrigo Agerri et al. (2015). Event Detection version 2.2. NewsReader Deliverable 4.2.2. http://www.newsreader-project.eu/files/2012/12/NWR-D4-2-2.pdf

• Methodological issues:

• Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek Vossen, Susan Legêne and Guus Schreiber. 2014. BiographyNet: Methodological issues when NLP supports historical research. Proceedings of LREC 2014.http://www.lrec-conf.org/proceedings/lrec2014/pdf/1103_Paper.pdf

• Niels Ockeloen, Antske Fokkens, Serge ter Braake, Piek Vossen, Victor de Boer, Guus Schreiber, and Susan Legêne. 2013. BiographyNet: Managing Provenance at multiple levels and from different perspectives. Proceedings of Linked Science 2013. http://linkedscience.org/wp-content/uploads/2013/04/paper7.pdf