diving into twitter data on consumer electronic brands

Post on 26-Jan-2015

108 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Which consumer electronic brands get tweeted about most? Which brands have more positive/negative sentiment? To find out, 15.3 gb of tweets was downloaded from 13 - 25 May using Python and then analysed in R.

TRANSCRIPT

Diving into Twitter dataon consumer electronic brands

Which brands get tweeted about most? Is it mainly positive or negative?

15.3 gb of JSON data downloaded from Twitter’s Streaming API

between 13 – 25 May using Python

Before processing, tweets were in raw JSON format

Time Created Tweet text/status

Username

Tweet location (if available)

No. of followers No. of people followed

No. of statusesLanguage

Data should be optimized as only a fraction of the data used for analysis—

optimization improves performance in models and saves cost and time

The same tweet we saw previously

By optimizing the data,

15.3 gb of json was converted to 757 mb of csv (5% of original size)

After processing, only some fields retained and converted to CSV format

Brand Positive Sentiment

Brand Negative Sentiment

Brand Mixed Sentiment

The list of words for sentiment analysis is adapted from

the Harvard General Inquirer dictionaries Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014

Tweets are then tagged for brand and sentiment in R

Initially, collected tweets based on 17 keywords

Samsung

S4

Xperia

HTC

Huawei

BlackBerry

Apple

S5

Sony

Nokia

Note 3Lumia

q5

iPhone

q10

z10

Motorala

“Apple” and “iPhone” accounted for 87% of tweet volume

Removed from keywords during actual data collection to focus on

other brands (, save space, and reduce bandwidth usage)

A trial was conducted with 16 keywords on 11 May, 8 – 9am

1 gb of JSON data was collected in a hour

During a one hour trial, “Apple” and “iPhone” had 87% share of tweets

Samsung

Sony

Nokia

HTC

Huawei

BlackBerry

Motorola

Tweets containing seven keywords were collected from 13 – 25 May

4% of tweets mentioned

> 2 brands; they were

excluded from analysis

8% of tweets had

mixed sentiment

(i.e., positive and

negative sentiment);

they were excluded

from analysis

92% of tweets

remained, each only

mentioning 1 brand

with either “positive”,

“negative”, or

“neutral” sentiment

3,681,942 tweets were collected

After processing, 3,234,678 tweets remained for analysis

Samsung leads in twitter buzz, followed by Sony and Nokia

Together, they make up 75% of twitter buzz

Samsung is the clear leader in twitter buzz, followed by Sony and Nokia

However, Samsung and Sony have wider product offerings

relative to the rest that mainly focus on phones

Also, Huawei’s users may mainly be on Weibo, Renren, etc

Most brands have roughly 1:1 ratio of

positive to negative tweets

Samsung is the exception with ratio of

roughly 3:2

Brands have equal ratio of positive to negative tweets

Dip due to connectivity issues

Brands’ share of tweets is roughly consistent over time

Spikes in tweet volume coincide with product launches

Spikes in tweet volume coincide with product launches

Users who tweet about

BlackBerry tend to be

better connected (i.e.,

higher median of

followers and people

followed)*

* Excluding outliers

Across brands, there is not much difference in user connectedness

The median user has

around 250 followers

and also follows 250

people

50th – 75th percentile of users

who tweet about Sony, HTC,

and Motorola have very high

numbers of all time tweets

(spam bots perhaps?)*

While Nokia is 3rd in twitter buzz

share (14%), users who tweet

about Nokia have least

numbers of all time tweets

Suggests that tweets likely to

come from real users and not

bots (or maybe less active bots)

* Excluding outliers

However, there is a large difference between users’ all time tweets

12833979

followers

11796709

followers

CNN’s tweet on Obama’s BlackBerry was “seen” by most followers

1753696 tweets

1730006

tweets

A bot that retweets on farts has the highest all time tweets

1753696 tweets

1730006

tweets

A bot that retweets on farts has the highest all time tweets

Initially, BlackBerry tweets showed 100% negative sentiment

Culprit was the word “lack”—it was removed

However, removing it reduced negative sentiment for other

brands by 2 – 3 %

An interesting error led to BlackBerry having 100% negative sentiment

Track brands’ managed twitter accounts and conversations to measure engagement Which brands have better engagement with users and why?

Track general message of tweets Are tweets of a brand mainly about sales, reviews, complaints, or news?

Network analysis to identify users with high centrality and influence Which users have high influence and what are they tweeting about my brand?

Geospatial analysis of tweets Are there differences in brand buzz, sentiment, and engagement across regions?

Where do we go from here?

Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA

Python script to download

tweets in JSON format

Python scripts to convert

tweets from JSON to CSV

(with & without regular

expressions filtering)

R script and sentiment

analysis list of words

R script and sentiment

analysis list of words to

reproduce BlackBerry error

top related