clare llewellyn lasiuk july 5th 2013

21
Clare Llewellyn University of Edinburgh Argumentation on the web - always vulgar and often convincing?

Upload: clare-llewellyn

Post on 09-May-2015

323 views

Category:

Technology


1 download

DESCRIPTION

Using argument analysis to structure user generated content.

TRANSCRIPT

Page 1: Clare llewellyn Lasiuk July 5th 2013

Clare Llewellyn University of Edinburgh

Argumentation on the web - always vulgar and often convincing?

Page 2: Clare llewellyn Lasiuk July 5th 2013

User Generated Content

Page 3: Clare llewellyn Lasiuk July 5th 2013
Page 4: Clare llewellyn Lasiuk July 5th 2013

Various Conversations

Page 5: Clare llewellyn Lasiuk July 5th 2013

Various Conversations

Main points of discussion:

RM is bad / old / Australian / has power over politicians / owns newspapers

RM does / doesn’t understand the internet

Free content is good / bad

The joke belongs to Tim Vine or Stuart Francis

Wider context discussion – PIPA / SOPA, Levenson Enquiry, phone hacking, TVShack

Page 6: Clare llewellyn Lasiuk July 5th 2013

The Problem

Can we somehow structure this data so we can read it and add to it at the most relevant point?

Page 7: Clare llewellyn Lasiuk July 5th 2013

Solutions?

Page 8: Clare llewellyn Lasiuk July 5th 2013

Argumentation

A participant makes a claim that represents their position

The participant backs up that claim with evidence

A counter claim challenges the position

The composer of the original claim may evaluate their position.

Page 9: Clare llewellyn Lasiuk July 5th 2013

Claim

Counter Claim

Evidence

Counter Evidence

Evaluation

Page 10: Clare llewellyn Lasiuk July 5th 2013

Macro / Micro Argumentation

Micro-level:Simple claimQualified claimGrounded claimGrounded and qualified claimNon-argumentative moves

Macro-level:ArgumentCounter argumentIntegration (reply)Non-argumentative moves

Weinberger and Fischer (2006)

Page 11: Clare llewellyn Lasiuk July 5th 2013

Methodology*

* Adapted from Bal & Saint-Dizier (2009) and Mochales & Moens (2009, 2011)

1. Identify discussions on different topics

2. Identify spans of text that represent the core points in the discussion

3. Classify into a structure so as to define the relationships between spans of text

4. Present this information to users

Page 12: Clare llewellyn Lasiuk July 5th 2013

Data Sets

Hand annotated corpus of tweets from the London Riots (7729) www.analysingsocialmedia.org

Comments from the Guardian newspaper (partially hand annotated for topic)

Tweets with the #OR2012 (5416)

Page 13: Clare llewellyn Lasiuk July 5th 2013

• Extract individual discussion

• Unsupervised clustering – very objective

• Selection of algorithm

Unigram / Bigram Frequency

Incremental Clustering

K-means

Topic modelling

Possible tools

NLTK (nltk.org)

Weka (www.cs.waikato.ac.nz/ml/weka/)

Mallet (mallet.cs.umass.edu)

Twitter Workbench (www.analysingsocialmedia.org/projects)

1. Topic Identification

Page 14: Clare llewellyn Lasiuk July 5th 2013

Example Clusters

Topic Modelling Incremental Clustering

Page 15: Clare llewellyn Lasiuk July 5th 2013

Are you doing what a human would do?

Results for comments data:

Evaluation

Page 16: Clare llewellyn Lasiuk July 5th 2013

2. Text Span Identification

Define a set of rules that allows the extraction of macro level argumentation

Annotated text you can use machine learning

Non-annotated you can define rules – is there something specific in the language that indicates claim / counter claim

Claim

Counter Claim

Page 17: Clare llewellyn Lasiuk July 5th 2013

Rules production

Method:Rules are a generalisation from a large amount of data (14000 quotes)Use Words / POS / Negation / SymbolsUse the rules to find this patterns where not explicitly mentioned in text

Examples:– Before:

• @USERNAME:– After:

• i don't• i think you• PRP VBP RB (Personal Pronoun, Verb singular present, Adverb)

– Both• START X i 'm not

Tools:LTT- TTT2 www.ltg.ed.ac.uk/software/

Page 18: Clare llewellyn Lasiuk July 5th 2013

3. Classify into a structure

Method

Based on Rose et al. (2008)

Use supervised machine learning to classify tweets into an argument structure

Using TagHelper tool kit (based on Weka) – www.cs.cmu.edu/~cprose/TagHelper.html– LightSide lightsidelabs.com– Decide on a machine learning algorithm– Define feature sets– Train and test

Page 19: Clare llewellyn Lasiuk July 5th 2013

Data Set Tweets

Coded with the classification system:

1. Claim without evidence2. Claim with evidence3. Counter-claim without evidence4. Counter-claim with evidence5. Implicit request for verification6. Explicit request for verification7. Comment8. Other

Page 20: Clare llewellyn Lasiuk July 5th 2013

Classification – Feature Selection

FeaturesUnigrams+ line length+ POS Bigrams + bigrams + punctuation+ stemming + no stemming + rare words + line length, punctuation and rare words+ no stop list

AlgorithmsSupport Vector MachineDecision TreeNaive Bayes

Page 21: Clare llewellyn Lasiuk July 5th 2013

QUESTIONS?

Clare Llewellyn University of Edinburgh

[email protected]