counts, comparisons, collocations, contestations: towards a dictionary of the future
TRANSCRIPT
T O WA R D S A D I C T I O N A RY O F T H E F U T U R E
COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS
SOME OTHER PLACES TO CHECK OUT
• The Google Ngram Viewer helps you understand trends across a bazillion books that Google has digitized. It’s an amazing resource:• So are the Corpus of Historical American English:
http://corpus.byu.edu/coha/ (COHA)• And the Corpus of Contemporary English:
http://corpus.byu.edu/coca/ (COCA)
TAKING CARE WITH COUNTS
• The counts in the last two slides are too small to be anything more than interesting• The next slide shows us tracking the collocates of
future• Collocates are the words that appear near a
given word—one of the chief collocates of salt is pepper, for example
KEYWORDS
• What are the words that are most contested?• How do they
change?• Who controls the
future?• Liberty vs. Freedom
JACK GRIEVE FINDING WOTY’S
• See also http://idibon.com/quantifying-word-year/
MEANING IS IN THE USE
• “For a large class of cases of the employment of the word ‘meaning’—though not for all—this way can be explained in this way: the meaning of a word is its use in the language” — Wittgenstein, Philosophical Investigations
MEANING IN THE USE
• Tumblr moms use over 4 x’s as many
and
as Twitter peeps• What are the
collocates?• Blue: his he him• Purple: she’s she• No pink heart option!
• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and http://idibon.com/emomji-emoji-new-moms-use/
CO-OCCURRENCES MATTER (MOVIE REVIEW RATINGS AND WORDS)
• The idea here is that if you’re writing a review and use the word wow, you’re being very positive or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.
• If you’re using however, however, you’re likely to be in the middle of your movie review rating or travel summary—not at the very positive/negative extremes.
• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf
FOUR CASE STUDIES
• Wholesomeness: http://idibon.com/wholesome-branding-campaign-effectiveness/• Entrepreneur: http://idibon.com/entrepreneurs-
french-spanish-english/• Because X:
http://idibon.com/innovating-innovation/• #BlackLivesMatter: http
://idibon.com/blacklivesmatter-events-change-conversations/
WHOLESOMENESSH TT P : / / I D I BO N . C O M / W H O L E S O M E -B RA N D I N G - C A M PA I G N - E F F E CT I V E N E S S
/
DEEP HISTORY
• The first uses of wholesome tended to be about ‘virtuous teachings’. • In Wycliffe’s Bible way back in 1382:
The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)
(Modern versions treat wordis as ‘words’, ‘teachings’, or ‘instructions’.)
HOW ABOUT IN SOCIAL MEDIA?
• You have to deal with spam (11% of data in this case; another 36% of data is “Wholesome Radio”, which is probably irrelevant)• In 2014 tweets:• Food: 23% (but mostly not about Honey Maid)• Humans: 23% (and how they can/should live; church-
related mentions are prominent)• Entertainment: 13% (movies, TV)
• Now let’s compare this to 2011 tweet uses:• Humans: 32%• Entertainment: 12%• Food: 9%
MORE ON CONTESTED WORDS
• In the next slide, you’ll see an image from Monroe et al (2008)
• This is work that takes the basic thing we know: Republicans and Democrats speak about the same issue differently.
• In the next slide, they are showing methods that can pull about how the parties speak about abortion when they take the floor.
• The words at the top are the Democratic party words, the ones at the bottom are the Republican party words.
• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf
ENTREPRENEURH TT P
: / / I D I BO N . C O M / E N T R E P R E N E U R S - F R E N C H - S PA N I S H - E N G L I S H /
ENTREPRENEUR IN ENGLISH, FRENCH, SPANISH
• Tycoon, mogul, industrialist• A flavor of ‘ill-gotten gains’
• Entrepreuneur doesn’t seem to have this—in English right now• Collocates have to do with:• Advice• Success• Investors• Marketing• Social (media/services/topics/techniques)• Failure (especially fear-of)• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,
Etsy)
• The people using entrepreneur identify themselves as• Authors, speakers, writers, bloggers, strategists, (life) coaches,
consultants, moms, wives, husbands, fathers, food-lovers, music-lovers
INTERCONNECTED AXES OF DIFFERENCE
• Genre (State of the Unions vs. Reddit comments)• Time (1940s vs. the last ten years)• Geography (hella vs. wicked)• Traditional demographics (age, gender,
education)• Personal identity/style (nerd, goth, bro, mom)
INNOVATIONS AND THEIR COMMUNITIES
• Because X’ers disporportionately like:• YouTube• Tumblr• One Direction (especially Harry)• Justin Bieber• Ariana Grande• “bands”• pizza• sex• cats• books
• They are decidedly less likely to talk about • software• basketball• NASCAR• business• words associated with African-
American Vernacular English
Part of speech Word counts ≥ 50
Noun (people, spoilers) 32.02%
Compressed clause (ilysm)
21.78%
Adjective (ugly, tired) 16.04%
Interjection (sweg, omg) 14.71%
Agreement (yeah, no) 12.97%
Pronoun (you, me) 2.45%
PART OF SPEECH TAGGERS ARE GOOD
• There’s even a pretty good one for Twitter POS
#BLACKLIVESMATTERH TT P : / / I D I BO N . C O M / B L AC K L I V E S M ATT E R- E V E N T S - C H A N G E - C O NV E R S AT I O N S
/
TOPIC MODELING
• In the previous sections, I’ve been noting what you can do when you have two or more comparison sets• How is wholesome used in time x vs. time y vs. time z• What are the differences between English speakers talking about
entrepreneurship vs. French speakers and Spanish speakers?• How are people who use the innovative Because X construction
different than people who don’t use it?
• In this section, we talk about topic modeling, which is a way to automatically identify clusters within a data set, even if you don’t have a comparison set.
• We’ll use this to explore conversations around #blacklivesmatter, but we’ll also see how these conversations shift before/after a particular moment in time
UNKNOWN UNKNOWNS
• In general, topic modeling is a way of addressing the limits of our knowledge. If you’re asking a question about data, you probably know something about the data going in. • But what we hear from people is that they are keenly
aware that they don’t know what they don’t know.• Topic modeling is meant to help that.
• In the next slides, another use of topic modeling: identifying the themes of Martin Luther King Jr.’s major speeches and sermons
• Topic modeling Dr. King’s major speeches and sermons gets these topics•Which change over time• See also http://idibon.com/topic-detection-mlk/