wissenschaftliche untersuchung von retweets

22
The Science of ReTweets 2009 DanZarrella’s

Post on 17-Oct-2014

1.212 views

Category:

Education


0 download

DESCRIPTION

Quelle: Danzarella.com Die hier vorgestellte Arbeit wurde von Dan Zarella im Jahr 2009 veröffentlich.

TRANSCRIPT

Page 1: Wissenschaftliche Untersuchung von Retweets

The Science of ReTweets

2009

DanZarrella’s

Page 2: Wissenschaftliche Untersuchung von Retweets

Contents

Why ReTweets Matter

More Followers = More ReTweets?

Distribution of ReTweets per Follower

RTpF of Suggested Users

ReTweeting & Links

Link Occurrence in ReTweets

ReTweetability of URL Shorteners

Lingustics

Most ReTweetable Words & Phrases

Least ReTweetable Words

Average Syllables per Word

Readability Grade Levels

Word Occurrence & Novelty

Parts of Speech

Punctuation Occurrence

Punctuation Types

4

5

6

7

8

9

10

11

12

13

14

15

16

Psychology

RID Content Types

RID Attributes

LIWC Attributes

Timing

Time of Day

Day of Week

About the Author & Data

17

18

19

20

21

22

Page 3: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 3

“Ideas shape the course of history.”

-John Maynard Keynes

Page 4: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 4

Why ReTweets Matter

I spend a lot of time working on ReTweets, because I believe them to be one of the most impor-tant developments in modern communications, extending far beyond the Twittersphere.

Like “The Matrix” was composed of computer code, the real world is made of infectious infor-mation. Your chair, your desk, the computer you’re reading this on, the food you’ll eat today, the money you’ll earn: they all began as ideas jumping from person to person. None of it would exist if the concept wasn’t contagious.

Ideological epidemics have made and lost fortunes, they have saved countless lives and caused horrific wars, they have birthed and destroyed nations. Clearly the most powerful weapon known to man would be the ability to create powerful mental viruses at will. The very course of human history would be at your whim.

You don’t spread ideas just because they are “good;” you spread them because of some other trigger or set of triggers has been pulled in your brain. And that trigger fires the biggest gun ever seen.

And yet, a reliable, repeatable method of crafting a contagious idea has not emerged.

The advent of the web changed how memes spread: it made them spread faster, it exposed them to more people, and it removed many of the constraints imposed by the limits of human memory. But there is one change that dwarfs them all: observability.

We can now compare millions of viral ideas to uncover the building blocks of contagiousness.

ReTweets may seem like a small idea, and they are in some ways. But that small idea is the first real window into how ideas spread from person to person. We can study the linguistic traits, the topical characteristics, the epidemiological dynamics, and the social network interactions that take place when a person spreads a meme.

Not only can this information help us create more contagious Tweets, but many of the lessons learned through ReTweets will be applicable to viral ideas in other mediums.

For the first time in human history we can begin to gaze into the inner workings of the con-tagious idea. That most powerful force can now be put under our microscope and probed for its secrets.

Page 5: Wissenschaftliche Untersuchung von Retweets

4 2009 © Dan Zarrella DanZarrella.com 5

Distribution of ReTweets per Follower

When I started thinking about how to get more ReTweets, my first thought was to get more followers; clearly, more followers would mean more ReTweets. To check this assumption, I looked at ReTweets per Followers (RTpF), the number of ReTweets per day divided by the number of followers.

The graph above shows the distribution of RTpF in the top 9000 most followed users in my database; I’ve graphed the actual distribution line in blue, with a 30-point moving average over it in black.

Here we see that while most users had an RTpF of under 1% in my dataset, some users showed much larger ratios, possibly indicating that there are a class of users who are more “ReTweetable” than others.

This means that while users with more followers will get more ReTweets, some users are able to get lots of ReTweets without lots of followers; their content must be more contagious.

Page 6: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 6

RTpF of Suggested Users

Non-Suggested Suggested

Twitter maintains a list of users as suggested people for new users to follow; the people on this list gain tens of thousands of new followers every day and are among the most followed people on Twitter.

I looked at 200 suggested users and compared them to the 200 most followed users not on the list. Since many of the suggested users are the most followed people on Twitter, they had a much higher average number of followers. So I compared the two groups using the RTpF metric.

The result is clear: suggested users are far less ReTweetable. I think this is likely due to the fact that many of the followers gained by those users on the suggest-ed list are new Twitter users and may be less ReTweet-savvy.

Page 7: Wissenschaftliche Untersuchung von Retweets

6 2009 © Dan Zarrella DanZarrella.com 7

I began to study the content of Tweets to identify traits that are correlated with more contagious, or ReTweetable, content. The first such trait was the pres-cence of a link.

I found that in a random sample of normal (non-ReTweet) Tweets, 18.96% con-tained a link, whereas 3 times that many ReTweets (56.69%) included a link.

This means that not only are ReTweets an accepted way to spread off-Twitter content, the prescence of a link may increase a Tweet’s chances of being shared.

Link Occurrence in ReTweets

All Tweets ReTweets

Page 8: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 8

ReTweetability of URL Shorteners

I know that most ReTweets contain a link, but there are hundreds of differ-ent URL shortening services available to help you save space with that link. I analyzed my database of over 30 million ReTweets and compared them to over 2 million random Tweets to find which shorteners are the most (and least) ReTweetable.

I calculated how much more or less often each URL shortening service appeared in ReTweets than it did in normal Tweets and presented this value as a percent-age. For instance, in my data 9.28% more ReTweets than random Tweets used bit.ly. I took into account the fact that ReTweets tend to contain more links than average Tweets and normalized the occurrence values.

I compared the percentage of occurrence for each shortener in random Tweets to the same shortener in ReTweets, to control for the popularity of services like bit.ly and tinyurl.com.

The short, post-Twitter shorteners, bit.ly, ow.ly, and is.gd were all more ReTweetable than the older, longer, tinyurl.

bit.ly

ow.ly

su.pr

is.gd

301.to

cli.gs

twurl.nl

ff.im

tr.im

tumblr.com

blip.fm

twitpic.com

tinyurl.com

-6% -4% -2% 0% 2% 4% 6% 8% 10%

Page 9: Wissenschaftliche Untersuchung von Retweets

8 2009 © Dan Zarrella DanZarrella.com 9

Most ReTweetable Words & Phrases

I compared common words and phrases in random Tweets and ReTweets to find those words that occur in ReTweets more than they occur in normal Tweets.

The word “you,” while very common, seems to occur especially often in ReTweets, indicating that if you’re talking to “me,” I am more likely to ReTweet it.

Its really not surprising that “Twitter” ranks high, but this is a good reminder that self-reference is always good for buzz in social media.

The words “please” and “please ReTweet” are very ReTweetable (“please rt” also ranked highly). It’s hard to overstate how important it is to ask for the ReTweet when you want it; calls to action work.

“New Blog Post” is the common prefix used when a person Tweets about, well, a new blog post to their site. That this ranks so highly tells us that Tweeting your posts is a very smart thing to do.

1. you2. twitter3. please4. retweet5. post6. blog7. social8. free9. media10. help

11. please retweet12. great13. social media14. 1015. follow16. how to17. top18. blog post19. check out20. new blog post

Page 10: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 10

Least ReTweetable Words

What about the words that are least likely to get your ReTweets?

There are a number of “-ing” verbs, including “going,” “watching” and “listen-ing,” which reinforces my understanding that answers to the “What are you doing?” question don’t get very many ReTweets.

The presence of “sleep,” “bed,” “night,” and “tired” indicate that people often Tweet “goodnight” style messages, but generally don’t ReTweet them.

The relatively informal nature of many of the words on the list including “lol,” “gonna,” and “hey,” show that simple or slang conversation is not ReTweetable.

The lesson learned here is that if you’re trying to get more ReTweets, don’t just engage in idle chit-chat or Tweet about mundane activities.

1. game2. going3. haha4. lol5. but6. watching7. work8. home9. night10. bed

11. well12. sleep13. gonna14. hey15. tomorrow16. tired17. some18. back19. bored20. listening

Page 11: Wissenschaftliche Untersuchung von Retweets

10 2009 © Dan Zarrella DanZarrella.com 11

Average Syllables per Word

I tested the assumption that simplicity is a vital component of ReTweets (as it has been observed in other viral-content types) and I found that random Tweets have 1.58 syllables per word on average, while ReTweets have an average of 1.62 syllables per word. Longer, higher syllable-count words are typically more com-plex, indicating that ReTweets may be more complex than their less viral coun-terparts.

Be sure to notice the scale of the graph above, the difference is small, but it cer-tainly challenged my hypothesis that ReTweets are less-complex.

1.63

1.62

1.61

1.60

1.59

1.58

1.57

1.56

ReTweets Random Tweets

Page 12: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 12

Readability Grade Levels

I then compared two different types of reading grade level analysis metrics and they revealed that ReTweets, in general, are less “readable” and require a high-er level of education to understand.

A Flesch-Kincaid test gave ReTweets a reading grade level of 6.47 years of edu-cation, while random Tweets only required 6.04 years. The similar SMOG test (Simple Measure of Gobbledygook) indicated that ReTweets required 6.13 years of schooling, with random Tweets only needing 5.88 years.

Again, take care to notice the scale of the Y-axis.

6.6

6.5

6.4

6.3

6.2

6.1

6.0

5.9

5.8

5.7

5.6ReTweets Random Tweets ReTweets Random Tweets

Flesch-Kincaid SMOG

Page 13: Wissenschaftliche Untersuchung von Retweets

12 2009 © Dan Zarrella DanZarrella.com 13

Word Occurrence/Novelty

Another characteristic commonly found in viral content is novelty; that is, the “newness” of the ideas and information presented. I created a measure of nov-elty by counting how many other times each word in my sample sets occurred.

In the random Tweet sample, each word was found an average of 89.19 other times, while in the ReTweet sample each word was only found 16.37 other times.

This shows us that while simplicity may not be very important to ReTweetability, novelty certainly is.

100

90

80

70

60

50

40

30

20

10

0ReTweets Random Tweets

Page 14: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 14

Parts of Speech

Part of speech (POS) tagging is an analysis technique in which an algorithm is used to label each word in a piece of content as a specific part-of-speech–noun, verb, adjective, etc.

The graph above shows what percentages of words in each sample were labeled as a specific part-of-speech. It lists only the most interesting parts from the much larger list of POS tags.

Interesting points from this data include the noun and 3rd-person heaviness of ReTweets, indicating a subject matter and headline type nature.

10%

9%

8%

7%

6%

5%

4%

3%

2%

1%

0%

Nou

n, p

lura

l

Prop

er n

oun,

sin

gula

r

Ver

b, 3

rd p

erso

n si

ngul

ar p

rese

nt

Prop

er n

oun,

sin

gula

r

Adj

ectiv

e, s

uper

lativ

e

Adj

ectiv

e, c

ompa

ritiv

e

Wh-

adve

rb

Ver

b, p

ast p

artic

iple

Ver

b, b

ase

form

Ver

b, g

erun

d or

pre

sent

par

ticip

le

Ver

b, n

on-3

rd p

erso

n si

ngul

ar p

rese

nt

Ver

b, p

ast t

ense

Nou

n, s

ingu

lar

or m

ass

Adv

erb

Random Tweets ReTweets

Page 15: Wissenschaftliche Untersuchung von Retweets

14 2009 © Dan Zarrella DanZarrella.com 15

Punctuation Occurrence

I compared a random sample of “normal” Tweets to a sample of ReTweets and found that 85.86% of Tweets contain some form of punctuation, and an over-whelming 97.55% of ReTweets do as well.

Of course, the prevailing ReTweet format includes a colon to better display the original Tweet, but even when ignoring this form of punctuation, ReTweets still contain more punctuation than non-ReTweets (93.42% to 83.78%).

100%

95%

90%

85%

80%

75% ReTweets Random ReTweets Random

With Colons Without Colons

Page 16: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 16

Punctuation Types

I then analyzed the frequency of specific types of punctuation and found that hyphens, periods and colons are the most ReTweetable punctuation, occurring far more commonly in ReTweets than in regular Tweets, while the rarest mark, the semicolon, is the only unReTweetable punctuation mark.

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

Col

on

Peri

od

Excl

amat

ion

Poin

t

Com

ma

Hyp

hen

Ellip

sis

Que

stio

n M

ark

Sem

icol

on

Random Tweets ReTweets

Page 17: Wissenschaftliche Untersuchung von Retweets

16 2009 © Dan Zarrella DanZarrella.com 17

RID Content Types

I used the two linguistic lexicons to analyze ReTweet content: RID and LIWC.

First is the more “Freudian” Regressive Imagery Dictionary (RID). This cod-ing scheme is designed to measure the amount and type of three categories of content: primordial (the unconscious way you think, like in dreams); conceptual (logical and rational thought); and emotional.

The graph above shows that ReTweets contain less primordial and emotional content than random Tweets and more conceptual content.

12%

10%

8%

6%

4%

2%

0% ReTweets Random ReTweets Random ReTweets Random

Emotional Primordial Conceptual

Page 18: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 18

RID Attributes

Looking at specific RID attributes, I saw that social and instrumental (constructive words like build and create) behaviors are ReTweetable, while abstract thought and sensation-based words are not.

2.5%

2%

1.5%

1%

.5%

0%Social Behavior Glory Instrumental Sound Vision Abstraction Behavior

Random Tweets ReTweets

Page 19: Wissenschaftliche Untersuchung von Retweets

18 2009 © Dan Zarrella DanZarrella.com 19

LIWC Attributes

The next analysis I performed used LIWC (pronounced “Luke”). This is a lexicon similar to RID, but based in more reviewed and accepted research and refined over 15 years. LIWC measures the cognitive and emotional properties of people based on the words they use.

LIWC analysis shows that Tweets about work, religion, money and media/celeb-rities are more ReTweetable than Tweets about negative emotions, sensations, swear words and self-reference.

3.5%

3%

2.5%

2%

1.5%

1%

.5%

0%

Occ

upat

ion

We

Med

ia

Rel

igio

n

Mon

ey

Insi

ght

Neg

ativ

e Em

otio

ns

Swea

rs

Sens

es

Tent

ativ

e

Self-

Ref

eren

ce

Random Tweets ReTweets

Page 20: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 20

Time of Day (EST)

I compared the volume of ReTweeting that occurs during the day to the volume of regular Tweets to find that ReTweeting is much more diurnal. While overall Tweet volume peaks during business hours and evening, ReTweeting occurs much more frequently between 3 PM and midnight.

If you want to get ReTweets, it makes sense to post your content during these hours.

6%

5.5%

5%

4.5%

4%

3.5%

3%

2.5%

2%

1.5%

1%

.5%

0%

12 A

M

1 A

M

2 A

M

3 A

M

4 A

M

5 A

M

6 A

M

7 A

M

8 A

M

9 A

M

10 A

M

11 A

M

12 P

M

1 PM

2 PM

3 PM

4 PM

5 PM

6 PM

7 PM

8 PM

9 P

M

10 P

M

11 P

M

1.50%

2.00%

2.50%

3.00%

3.50%

4.00%

4.50%

5.00%

5.50%

6.00%12

:00

AM

1:00

AM

2:00

AM

3:00

AM

4:00

AM

5:00

AM

6:00

AM

7:00

AM

8:00

AM

9:00

AM

10:0

0 AM

11:0

0 AM

12:0

0 PM

1:00

PM

2:00

PM

3:00

PM

4:00

PM

5:00

PM

6:00

PM

7:00

PM

8:00

PM

9:00

PM

10:0

0 PM

11:0

0 PM

Random Tweets ReTweets

Page 21: Wissenschaftliche Untersuchung von Retweets

20 2009 © Dan Zarrella DanZarrella.com 21

11.00%

12.00%

13.00%

14.00%

15.00%

16.00%

17.00%

18.00%

19.00%

Sun Mon Tue Wed Thu Fri Sat

Day of Week19%

18%

17%

16%

15%

14%

13%

12%

11%Sun Mon Tues Wed Thu Fri Sat

Random Tweets ReTweets

I also compared the volume of activity that occurs on various days of the week. Overall Tweeting activity peaks during the business week, as does ReTweeting activity.

Monday and Friday are both ReTweetable days, in that a higher percentage of ReTweeting activity for the week occurs than does regular Tweeting. Friday, however, shows the highest volume of ReTweeting, and Thursday the highest volume of standard Tweeting.

Page 22: Wissenschaftliche Untersuchung von Retweets

2009 © Dan Zarrella DanZarrella.com 22

About the Author

About the Data

Dan Zarrella is an award-winning social, search, and viral marketing scientist and author of the upcoming O’Reilly media book “The Social Media Marketing Book.“

Dan has written extensively about the science of viral marketing, memetics and social communications on his own blog and for a variety of popular industry blogs, including Mashable, CopyBlogger, ReadWriteWeb, Plagiarism Today, ProBlogger, Social Desire, CenterNetworks, Nowsourcing, and SEOScoop.

He has been featured in The Twitter Book, The Financial Times, NYPost, The Bos-ton Globe, Forbes, Wired, The Wall Street Journal, Mashable and TechCrunch. He was recently awarded Shorty and Semmy awards for social media & viral marketing.

A frequent guest speaker and panelist, Dan has spoken at PubCon, Search Engine Strategies, Convergence ‘09, 140 The Twitter Conference, WordCamp Mid Atlantic, Social Media Camp, Inbound Marketing Bootcamp, and The Texas Domains and Developers Conference. He currently works as an inbound marketing manager at Hub-Spot.

Over the course of 9 months, beginning in December of 2008, I’ve collected over 40 million ReTweets, including Tweets that contain variations of “RT,” “ReTweet,” and “Via.” I’ve also collected a random sampling of over 10 million “regular” Tweets that may or may not be ReTweets. The Tweets come from Twitter’s search API and its streaming API, both of which I have whitelisted access to. I use these 2 data sources and my own PHP scripts to analyze and compare the characteristics of both.