background 30 seconds 5 minutes 24 minutes 54 minutes 1 m 2.5 hours systematic attempts to measure...

Background

30 seconds 5 minutes 24 minutes

54 minutes

1 mm

1 mm1 mm 1 mm

1 mm1 mm 1 mm

1 mm

2.5 hours

Systematic attempts to measure partisan bias in the media tended to focus on estimating presentation bias, also known as “slant” or “spin,” using article keywords, phrases and citations. Groseclose and Milyo (2005) measure media bias by comparing interest group citations in news sources to interest group citations by members of Congress. Gentzkow and Shapiro (2010) measure bias by comparing newspaper article language with language in Congressional speeches. News in the era of Google and the Internet, however, is more prone to a form of agenda-setting bias sometimes called selection bias in which partisan narratives are constructed by selecting stories about topics which fit these narratives (Groeling 2013). While the expression of a partisan agenda among pre-Internet news organizations often required journalists to spin the same stories that their competitors were reporting on, contemporary news organizations can now more easily write a series of stories that appear to be ``spin-free'' but yet together form a highly biased partisan narrative. In this paper, we use probabilistic topic models to measure bias among 13 of the top online news sources and rank these organizations accordingly.

An Unbiased Measure of Media Bias Using Latent Topic ModelsLefteris Anastasopoulos, Aaron Kaufman and Luke Miratrix

Contact Email: [email protected] UC Berkeley School of Information, Harvard University Department of Government and Department of Statistics

Abstract

Most research using article text says “yes.”•Groseclose and Milyo (2005) – Liberal/Democratic bias.•Gentzkow and Shapiro (2010)– Biased toward ideology of readers.•Quinn and Ho (2008) – Liberal bias.

Presentation and Selection Bias

Presentation Bias: Traditional Framework

Top “Partisan” words as measured using the Congressional Record. Source: Gentzkow and Shapiro (2010)

Comparison of words and phrases (n-grams) in known partisan/ideological sources with words and phrases in articles.

Groseclose and Milyo (2005) – Think tank citations.

Quinn and Ho (2008) – Supreme Court opinions.

Gentzkow and Shapiro (2010) – Congressional speeches.

Is there Partisan Bias in the Mainstream Media?

What is Media Bias?

Presentation Bias - How something is covered.- Referred to as “spin.”

Selection Bias - What is covered.- AKA “agenda setting bias.”

Ferguson Coverage. Source: The Daily Mail

Ferguson Coverage. Source: Huffington Post Drudge Report Coverage of “Obamaphones”

Huffington Post Coverage of Vegan Diets

Selection Bias: Traditional Framework

At any time t, there exists a universe of unobserved stories St.

News sources, Ndt , are conceptualized as sets of non-random

draws from St. A news source is a set of stories that has an ideological/partisan

valence. Goal was to restrict St enough to measure selection bias.

e.g., Larcinese, Puglisi and Snyder (2011): St = Periodically

released government reports.Reframing Media Bias

Bias of a news source d is a function of Selection Bias SBd and

presentation bias PBd.

News Sources are Collections of Latent Topics - While the

latent universe of potential news stories St will always be unknown,

the Latent Dirichlet Allocation (Blei, Ng and Jordan 2003) allows us to estimate a latent universe of k topics within each news source d over a time period t.

Selection Bias

I = Partisan/ideological “valence” of topic k

θad =

Distribution of topic proportions for article a in news source d.

SBad =

Selection bias of article a in source d.

E[SBad] =

Selection bias of article a in source d.

(1)

(2)

(3)

(5)

(6)

(7)

Reframing Media Bias (continued)

(4)

Using LDA to Measure Selection and Presentation BiasOverview

Corpus – Each of 13 top news website sources: CNN, NYT, Fox, LA Times, USA Today, WashPo, HuffPo, Chicago Sun Times, NY Daily News, ABC News, Wall St. Journal, NBC News.Documents – Articles within each source over the course of several months.Topics – Arbitrarily set to K = 30.Step 1: Extract K=30 topics from articles within each news source

(8)

(10)

Graphical model and joint probability distribution for a K = 30 topic model of A articles in a single news source.

Step 2: Identify topics common to all outlets and topics unique to each outlet using KL divergence to measure similarity between topic distributions (Kim and Oh 2011).

Probability distribution over words for topic k in source d.

KL divergence measure of similarity between topic k in source d and topic k in source d’.

Step 3: Estimate partisan valence of each topic using partisan Mechanical Turk workers.

Valence measured as topic “interest” among Republican and Democratic Turkers

Step 4: Measure absolute and relative selection bias of all top media. Absolute Selection Bias – Partisan valence of topics common to all

media sources. Relative Selection Bias – Partisan valence of topics unique to each

source.Preliminary Results: Labeled topics and top 10 topic words for 13 news sources collected over 1 month..

Estimating Media Bias Using Topic Models

(9)

background 30 seconds 5 minutes 24 minutes 54 minutes 1 m 2.5 hours systematic attempts to measure...

Documents

partisan bias

measure bias

reframing media bias

liberal bias

liberaldemocratic bias

measure media bias

presentation bias pb

estimating presentation