argumentation framework
TRANSCRIPT
Analyzing Arguments during a Debate using Natural Language Processing in Python - IABHINAV GUPTA
How a debate may proceed
This new movie ‘Superman vs. Batman’ is so cool! The
winner has got to be Superman, with his mighty
Kryptonian abilities and people’s support. What do
you think?!
I agree! Superman is definitely more capable than Batman.
What are you saying? Batman is so much technologically
advanced!
Both, Batman and Superman, are powerful in their own ways.
It will be a draw.
Ben Affleck is so HOT!
What will we discuss? Basic Natural Language Processing (NLP) techniques
Implementation of NLP in Python NLTK
Stepwise workflow for processing arguments in a debate to: Determine polarity of an argument
Determine quality of argument and score it
Determine the winner of debate
A complete debating framework built from various Python modules
Why Natural Language Toolkit (NLTK)?
Platform for implementing Natural Language Processing through Python programs
Huge database of corpora and lexical resources with an easy interface
Built-in libraries of several text processing algorithms
Open Source!
Starting with the Basics
“I do not feel very good about Monday mornings.”
Tokenization [‘I’, ‘do’, ‘not’, ‘feel’, ‘very’, ‘good’, ‘about’, ‘Monday’, ‘mornings’]
Parts of Speech Tagging ‘I’ – Personal Pronoun‘do’ – Verb‘not’ – Adverb‘feel’ - Verb,‘very’ – Adverb‘good’ – Adjective‘about’ – Preposition‘Monday’ – Proper Noun‘mornings’ – Plural Proper Noun]
Basics with NLTK
Tokens [‘I’, ‘do’, ‘not’, ‘feel’, ‘very’, ‘good’, ‘about’, ‘Monday’, ‘mornings’]
Removal of Stop Words [‘I’, ‘feel’, ‘good’, ‘Monday’, ‘mornings’]
Stemmed Words [‘I’, ‘feel’, ‘good’, ‘Monday’, ‘morn’]
What we look for in an argument?
What is the stance taken by the debater in this argument?
Has the debater changed stance from the previous arguments?
Is the argument related to the debate or irrelevant?
Is the argument good enough?
Analysis of an Argument
• Is the argument related to the debate?
SEMANTIC SIMILARITY
• What is the polarity of the argument?
SENTIMENT ANALYSIS • Is the argument
good enough?
SCORING
• Has the debater changed stance?
BACKTRACK
Semantic Similarity Semantic Distance between words in context is the distance between their
underlying senses or lexical concepts. d(festival, celebration) < d(school, circus)
Semantic Similarity is how close the lexical concepts of two units (word, sentence, paragraph) of language are. d(Mangoes and bananas are fruits, Mangoes are sweeter than bananas) < d(Raj has a job at the
hospital, Hospitals have a huge staff of doctors and nurses)
Lexical databases like WordNet group English words into sets of synonyms expressing a distinct concept and are used for calculating semantic similarity
Word Net based Similarity
Such a network forms the basis of several distance formulae to calculate semantic similarity
Similarity between Sentences
A new NASA initiative will help lead the search for signs
of life beyond our solar system
The Nexus for Exoplanet System Science, or NExSS, will take a multidisciplinary
approach to the hunt for alien life
newNASA
InitiativeHelpLead
SearchSignsLife
BeyondSolar
SystemNexus
ExoplanetAlien
ScienceNExSSTake
multidisciplinaryApproach
Hunt
Joint Word Set
Sentence 1
Sentence 2
11111111111000000000
00000001001111111111
1 2
Similarity between Sentences The simplest similarity score is to take the cosine distance between the two vectors:
More sophisticated formulae identify similar pair of words and assign decimal values depending on the semantic distance. For example, in our word set, d(Search, Hunt) = 0.8
d(Solar, Exoplanet) = 0.4
Sometimes, the order in which the words appear in the sentence also make a difference. Order Similarity is also considered. India defeated Pakistan
Pakistan defeated India
Sentiment Analysis Sentiment Analysis (or opinion mining) is the process of detecting the contextual
polarity of text
NLP Techniques, Statistics and Machine Learning is used to identify the sentiment content in a text
It finds application with Movie Reviews, Blogs, Customer Feedback, Twitter and other microblogging sources
Most popular classifier used for Sentiment Analysis is the Naïve Bayes Classifier, available as a module in NLTK and TextBlob, a Python library for textual data
Sentiment Analysis using Naïve Bayes
Training Corpus
Polarity Lexicon
Naïve Bayes Classifier
Neutral?
Test Data Yes
Positive/Negative
Thank You!Please keep watching this space for Part II