social filtering. computational journalism week 5
Post on 05-Dec-2015
11 Views
Preview:
DESCRIPTION
TRANSCRIPT
Frontiers of Computational Journalism
Columbia Journalism School
Week 5: Social Filtering
October 9, 2015
User
User
stories not covered
filtering
x
x
xx
x
x
x
x
x
xx
x
who user chooses to follow = social filtering
Twi>er follower network “We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4, 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks”
- Kwak et. al, What is Twitter, a Social Network or a News Media?
More “followings” than followers
Small avg distance between nodes
It’s a news network -‐‑ hubs
It’s a news network
Small number of high-‐‑degree hubs
Different network structure than e.g. Facebook.
Different uses.
why?
-‐‑ Zynep Tufekci, What Happens to #Ferguson Affects Ferguson: Net Neutrality, Algorithmic Filtering and Ferguson
John McDermo>, Why Facebook is for ice buckets, TwiBer is for Ferguson
data from SocialReach, who works with many publishers
-‐‑ Sunita, Why #Ferguson broke out on TwiBer, not Facebook
Information flow on Facebook
Finding sources on social media
Classify Users Classic machine learning problem. Classify each user as one of: • journalist/blogger • organization • ordinary individual First, need to encode as a vector / select features...
Features for user classifier • # of followers / following • # of posts, favorites • percentage of posts that are RTs, @replies, links • presence/absence of named entities • topic distribution of tweets (IPTC top level topics)
Digression: IPTC Media Topic Codes International standard hierarchical taxonomy, part of the NewsML markup system. Defined by Reuters, AP, NYTimes...
K-‐‑nearest neighbor classifier
Take K closest training points (in high dimensional feature space), choose majority label.
Creating the training data 1,850 random users 1,532 known organizations 1,490 known journalists and bloggers Hired Mechanical Turk workers to apply labels. Each user labeled by two workers, discarded if disagreement.
Classifier Accuracy
“Eyewitness” classifier Goal is to find individual tweets that are eyewitness reports. Started with LIWC (“linguistic inquiry and word count”) dictionary that classifies English words along 70 different dimensions, including emotion, cognition, time, health...
Word Aspects
Used “perception” category words plus “insight” and “certainty” words
Eyewitness tweet classifier It’s an eyewitness tweet if it contains any of these special words! (or their stems) High precision! Low recall. • 89% of tweets classified as eyewitness actually were. • But only 32% of eyewitness tweets detected.
Other dimensions Tweet contains URL to photo or video (used table of domain names, e.g. flickr.com = photo) Posted from mobile device (from tweet metadata naming posting app) Geocode user’s stated location (this is painful and unreliable) Distribution of friends’ locations. (Friend = mutual following)
Test user reactions “This gives you context… you have the context for whether or not you think they’re reputable or whether or not they’re worth reaching out to.” “It’s giving me a lot of context which is really useful when you’re trying to verify if someone is reputable or not.” “I would tend to focus on the eyewitnesses and journalists/bloggers. Eventually I’d look at everyone else but I’d want to start my search with those two groups because they would normally provide me with the most information.”
Test user reactions Popular features:
Eyewitness filtering, user location, image/video filter
Unpopular features:
Entity extraction not helpful, no ability to filter by location and eyewitness status, focus on users instead of content
Social Software Basic assumption: structure of software influences how groups use it. or: architecture influences behavior
Three ways to influence behavior Norms: culture, habits, etiquette, the user’s sense of what is “right” or “appropriate” Laws: rules enforced by the administrator Code: what it is actually possible to do
Design problem... What do we want the users to accomplish together? How do we encourage this? We can write the code, but the culture is a separate issue.
top related