polls and news articles during the 2016 usa presidential ...200.145.112.249/webcast/files/federico...

17
The Mass Media bias: Analysing and comparing the time series of polls and news articles during the 2016 USA presidential election. Federico Albanese ([email protected]) Director: Pablo Balenzuela Codirector: Viktoriya Semeshenko Departamento de Física, FCEyN-UBA

Upload: others

Post on 10-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

The Mass Media bias: Analysing and comparing the time series of

polls and news articles during the 2016 USA presidential election.

Federico Albanese([email protected])

Director: Pablo BalenzuelaCodirector: Viktoriya Semeshenko

Departamento de Física, FCEyN-UBA

Page 2: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Objectives

1) Does a Mass media influence the society?

2) Does the negative propaganda have a positive or negative effect in a candidate?

3) Is there a bias in the Mass Media?

Page 3: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Polls

- 263 polls ( an average of 2.7 polls per day)

- Made by: NBC, New York Times, LA Times, CBS, Fox News, Gravis, ABC, IBD (entre otros)

∆(Clinton - Trump)

Time [month]

perc

enta

ge [%

]

Page 4: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

MediaNew York Times Fox News Breitbart

[2] https://datascience.berkeley.edu/data-media-map-bitly/

- The most republican media, according to a study made at Berkeley University (2013) [2].

An article by A.J.Delgado in Oct. 22 2015

- Fox News is more conservative,whereas Breitbart is exclusively pro-Trump from the very first day.

[1] Google Trends in the USA between the most important newspapers

- Most consume and most google newspaper in the USA [1].

Page 5: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

First look into the data

Clinton Trump

Number of mentions per article in the New York Times

Page 6: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

First look into the data

Clinton Trump

Number of mentions per article in the New York Times

Clinton was mention less than 5 times in most of the articles. In contrast, Trump was mention more than 80 times in some articles.

Page 7: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Sentiment AnalysisStandford NLP: The algorithm makes a binary tree from each sentence taking into account the semantic composition.

(There are slow and repetitive parts, but it has just enough spice to keep it interesting )

Going from the children to the root, a sentiment value (positive, negative or neutral) is assigned for each node

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A., & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).

Page 8: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Sentiment AnalysisTime Series:

(1) Republican National Convention(2) First Debate(3) Election Day

Clin

ton

Trum

p

dates

Num

ber o

f fra

ses

# positive frases

# neutral frases

# negative frases

# total frases

(1)

(1)(1)

(1)

(1)

(1)(2)

(2)

(2)

(2)(2)

(2)

(3)

(3)

(3)

(3)

(3)

(3)

Num

ber o

f fra

ses

Page 9: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Linear CorrelationLinear Correlation with a 14 days lag

Coeficient p-value Coeficient p-value Coeficient p-value

Clinton’s positive mentions 0.485 3.43e-6 -0.213 0.05 0.060 0.590

Clinton’s negative mentions 0.394 2.24e-4 -0.682 1.29e-12 -0.319 0.3

Clinton’s total mentions 0.453 1.70e-5 -0.616 5.54e-10 -0.174 0.116

Trump’s positive mentions 0.554 5.64e-8 -0.395 2.20e-4 0.160 0.149

Trump’s negative mentions 0.476 5.39e-6 -0.470 7.54e-6 -0.021 0.853

Trump’s total mentions 0.518 5.31e-7 -0.437 3.62e-5 0.082 0.460

Page 10: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

- The more phrases published by the New York Times, bigger the difference in favor of Clinton.

- The more phrases published by Fox News, Trump goes up in the polls and smoller is the difference.

Difference in the polls

Linear CorrelationLinear Correlation with a 14 days lag

Coeficient p-value Coeficient p-value Coeficient p-value

Clinton’s positive mentions 0.485 3.43e-6 -0.213 0.05 0.060 0.590

Clinton’s negative mentions 0.394 2.24e-4 -0.682 1.29e-12 -0.319 0.3

Clinton’s total mentions 0.453 1.70e-5 -0.616 5.54e-10 -0.174 0.116

Trump’s positive mentions 0.554 5.64e-8 -0.395 2.20e-4 0.160 0.149

Trump’s negative mentions 0.476 5.39e-6 -0.470 7.54e-6 -0.021 0.853

Trump’s total mentions 0.518 5.31e-7 -0.437 3.62e-5 0.082 0.460

Page 11: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Mutual Information of the symbolize time series

where Xi and Yj are two random variables and “n” and “m” are the number of possible values for X and Y. The value of MI goes from 0 (no mutual information) and 1 (perfect relation between the variables).

Mutual Information (MI) measures the dependency between two time series:

- The permutation test was used in order to measure the significance of the statistics results [1].

- A symbolization of all the time series was made for this analysis [2]:

[1] François, D., Wertz, V., & Verleysen, M. (2006, April). The permutation test for feature selection by mutual information. In ESANN (pp. 239-244).[2] Bandt, C., & Pompe, B. (2002). Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17), 174102.

Page 12: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Mutual Information of the symbolize time series

DonaldTrump

Polls of Hillary Clinton

Hillary Clinton

DonaldTrump

It was observed how the sentiment of the frases is important and it is related to the time series of the polls.

Page 13: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Topic Detection:Dimensionality reduction

Page 14: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Topic Detection

Ramos, J. (2003, December). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142).Xu, W., Liu, X., & Gong, Y. (2003, July). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 267-273). ACM.

Advantages: - Vectors have positive components (easy interpretation)

- Orthogonality is not imposeDisadvantages: - The # of topics is an input, not an output of the algorithm.

Dimensionality reduction:NMF is an algorithm where a matrix V is factorized into two matrices W and H (M ≈ H*W ), with the property that all three matrices have no negative elements.

How could you mathematically represent a document?

- Vectors

V = [ ... , TF(t)*IDF(t) , … ] -> dim = # words

con:

where N is the # of documents and nt the # of documents in which the word t appears.

Combining all the vectors of all the documents, we have a matrix M

Page 15: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Non Negative Matrix Factorization (NMF)

ECONOMY Social Issues: Immigration

Page 16: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

Detección de tópicos para cada medio por separado

Social Issues(Immigration and racism)

Economy

week review

Clinton’s and Trump’s scandals

Art

Foreign affairs

Temas:

Elections

Clinton’s email scandal

Social issues(immigration)

Economy

Foreign affairs

Clinton foundation scandals

Temas:

Social issues (racism)

FBI investigation of the Clinton’s emails

third party

Clinton foundation scandals

Social issues(immigration)

Clinton’s email scandal

Temas:

Page 17: polls and news articles during the 2016 USA presidential ...200.145.112.249/webcast/files/Federico Albanese... · Linear Correlation Linear Correlation with a 14 days lag Coeficient

<< [email protected] >>