celi @clic2014: geometric and statistical analysis of topic and emotions in corpora
DESCRIPTION
La nostra seconda presentazione al CLIC 2014: "Geometric and Statistical Analysis of Topic and Emotions in Corpora", con cui Francesco Tarasconi ha vinto l'attestato di Distinguished Young Paper, dato agli 8 migliori papers del convegno con un autore giovane.TRANSCRIPT
Geometric and Statistical Analysis of Topics and Emotions in Corpora Francesco Tarasconi - [email protected] Vittorio Di Tomaso - [email protected]
Pisa, 9/12/2014
Introduction: Analysis of Emotions
Francesco Tarasconi and Vittorio Di Tomaso 2
NLP: Topic detection Sentiment analysis Emotion detection Many, potentially correlated, variables Role of Data Analysis: Define, visualize and understand emotional similarities Focus of the present work: background, metholodogy, examples
BACKGROUND
A Model of Emotions in Social Networks
Francesco Tarasconi and Vittorio Di Tomaso 4
Primary emotions according to Ekman (1972): Anger
Disgust
Fear
Joy
Sadness
Surprise
Plus:
Love
Like Dislike
© Paul Ekman. All rights reserved
Social TV, the “Second Screen”
Francesco Tarasconi and Vittorio Di Tomaso 5
Sharing of experiences (and emotions!) between viewers of the same program
Source: Blogmeter, www.blogmeter.it
Emotional profiles of audiences and, by extension, of whole shows / episodes
METHODOLOGY
Vector Space Model Representations
Francesco Tarasconi and Vittorio Di Tomaso 7
DOCi = { topic A, topic B, ... , emotion x, emotion y, ... } Annotated documents as vectors in a ntopic + nemotion dimensional space Document-annotation indicator matrix D TOPICi = [ frequency 1, frequency 2, ... , frequency nemotion ] Topics as vectors in a nemotion dimensional space Topic-emotion frequency matrix T IMPRESSIONi = { topic A, emotion x } Impressions as vectors in a ntopic + nemotion dimensional space Impression-annotation indicator matrix J
Emotional Distances Between Topics
Francesco Tarasconi and Vittorio Di Tomaso 8
Key elements: 1) High variance in topic absolute frequencies
2) High variance in emotion absolute frequencies
3) A graphical representation is required
4) Why are two topics similar?
A graphical representation can be obtained using by dimension reduction.
Simple and Multiple Correspondence Analysis
Francesco Tarasconi and Vittorio Di Tomaso 9
Strong link with PCA: dimension reduction, eigenvalue methods CA (Hirschfeld, 1935) of contingency table T
SVD of standardized residual matrix Principal coordinates and symmetric map Inertia and quality of the representation
MCA of indicator matrix J or Burt matrix JTJ Analysis of surveys (Benzecrì, 1960s – 1970s) As a geometric method (Le Roux and Rouanet, 2004) Adjustment of inertia (Greenacre, 2006)
Why MCA
Francesco Tarasconi and Vittorio Di Tomaso 10
1) It accounts for different volumes in the original variables (masses), but focuses on the shape of data (residuals)
2) Graphical method
3) Symmetric treatment of topics and emotions
EXAMPLES
Social TV Emotional Landscape
Francesco Tarasconi and Vittorio Di Tomaso 12
X Factor’s Emotional Phases
Francesco Tarasconi and Vittorio Di Tomaso 13
MasterChef’s Quirks
Francesco Tarasconi and Vittorio Di Tomaso 14
X-Factor vs MasterChef
Francesco Tarasconi and Vittorio Di Tomaso 15
Conclusions and Further Researches
Francesco Tarasconi and Vittorio Di Tomaso 16
We have shown how to represent and highlight important emotional relations between topics using carefully chosen multivariate techniques. In future we would like to:
add information about the authors to our analysis; study in greater detail the clouds of impressions, documents and authors.
We would like to thank: V. Cosenza and S. Monotti Graziadei for stimulating these researches; the ISI-CRT foundation and CELI S.R.L. for the support provided through the Lagrange Project;
A. Bolioli for the essential help and supervision in the preparation of this paper.
Grazie per l’attenzione!
Pisa, 9/12/2014