diy basic facebook data mining
TRANSCRIPT
![Page 1: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/1.jpg)
Pleasures of basic Facebook data
shoveling
Jan Fait STEM/MARK
Guest Lecture at Charles University,
Prague, 4.12.2013
![Page 2: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/2.jpg)
1. Why A tiny philosophical
corner
2. How No programming, just copy
pasting
Today we are going to talk about :
![Page 3: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/3.jpg)
The Boring part
Why are we doing this?
What‘s in it for you?
What are other ways to do this?
The Fun part
How is it done?
Why would I even try to mine FB data myself?
![Page 4: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/4.jpg)
What is a facebook like worth for your business?
![Page 5: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/5.jpg)
In what ways are my fans like my other customers?
What do I actually know about my fans and followers on top of their age?
Can I group my followers into segments?
Can I target my followers based on what they (are) like ?
Which ones are creating the most activity?
What on earth are all the other ones doing?
How similar/different is my competitors fanbase?
Here‘s why. Sample questions:
![Page 6: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/6.jpg)
Built-in insights are fine for fanpage managers, but not for research
Who could have guessed..
![Page 7: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/7.jpg)
External validity Research in social media tells you little about life outside social media Facebook self vs. Real self
Sampling Only some profiles are public > Is there enough data to make claims about my fanbase?
Organic environment Network engineers keep changing stuff so you are in constant need of adjustment
Limitations of FB research?
![Page 8: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/8.jpg)
OK, but there are other ways..
Bambillion !
Always posted by a lady in her 40s
![Page 9: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/9.jpg)
Indeed, there are ways:
Ask professionals and pay them accordingly (see below)
Setup a social media login or create an app (a rather good
investment)
Use ready-made tools and solutions (and pay for the useful ones)
DO IT YOURSELF – PARTISAN STYLE
![Page 10: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/10.jpg)
Come
Buy
Recommend
Return
Buy more
What does a brand
manager want from
a customer?
![Page 11: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/11.jpg)
Come
Engage
(Share)
Return
Engage more
What does a fanpage
manager want from a fan?
![Page 12: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/12.jpg)
How is it done?
![Page 13: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/13.jpg)
Facebook developers are smart so the road is a bit thorny
Good tools are usually not free
Open source tools are usually not as good
Its mostly fine legally
Obstacles ahead
![Page 14: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/14.jpg)
… but I am not a technical type.
a) Find someone who is b) Break it down into little steps c) Your chance to stand out
![Page 15: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/15.jpg)
MS Excel / iOS Numbers Programs > MS Office / ??
OpenRefine http://openrefine.org/download.html
Engineered at Google Inc., formerly named Google Refine
Facebook‘s own Graph API https://developers.facebook.com/tools/explorer
Tools to use (where facebook meets google and google meets microsoft)
![Page 16: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/16.jpg)
Subjects to examine (pick any fanpage or group or event)
https://www.facebook.com/Gambrinus.cz
![Page 17: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/17.jpg)
Subjects to examine (pick any fanpage or group or event)
https://www.facebook.com/PilsnerUrquellCzech
![Page 18: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/18.jpg)
Stand-off
Brand
Product More expensive, high-end beer
Widely and wildly consumed cheaper
beer
Image
Quality, tradition, national
heritage,craftmanship
Fun, shared moments, soccer
Number of fans 204 734 47 566
Number of posts in 2013
415 425
Not really competitors,have the same mothership !
![Page 19: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/19.jpg)
Hypothesis time
H1 : Their active fanbase consists of a less 10% of the total fans
H2 : There is more than 10% overlap in their active fanbase
H3 : Gambrinus and Pilsner Urquell have the same engagement per post
H4 :The interest positioning will show a small affinity as beer is widely appreaciate across the population
![Page 20: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/20.jpg)
Action !
![Page 21: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/21.jpg)
Step 1 - Do not fear the Graph API
https://developers.facebook.com
![Page 22: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/22.jpg)
Step 1 - Do not fear the Graph API
https://developers.facebook.com/tools/
![Page 23: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/23.jpg)
Step 1 - Do not fear the Graph API
Access_token !
Fields selector
Result window
https://developers.facebook.com/tools/explorer
![Page 24: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/24.jpg)
Step 1 – Facebook is nothing but a couple big tables
https://developers.facebook.com/docs/reference/fql
![Page 25: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/25.jpg)
Step 1 – The JSON result format (JavaScript object notation)
Graph API gives you a result in JSON Format. Visually disturbing yet convenient format used in web applications. Wait and see how OpenRefine handles it..
No, not this Json
![Page 26: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/26.jpg)
Get the id of the fanpage - many ways to do it, f.e :
1) Click on a page profile pic
2) Look in the address bar and cut the last number before „type“
Step 2 – Making a simple Graph API query
146991996743
![Page 27: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/27.jpg)
1) Get a fresh access_token
2) And get data from your own timeline
123455687/posts?post_id&limit=50
Step 2 – Making a simple Graph API query
Important, otherwise you will only get a handful
![Page 28: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/28.jpg)
1) Repeat with our Gambrinus.cz fanpage
2) And add some more fields – query likes and comments, increase limit, reduce timespan with a unix timestamp (135..)
146991996743/posts?fields=likes,comments&limit=20000&since=1356998400 (from 1.1.2013)
Step 2 – Making a more complex query
![Page 29: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/29.jpg)
A) URL : https://graph.facebook.com/ B) query : 146991996743/posts?fields=likes,comments&limit=20000&since=1356998400 C) Access token : &access_token=XXXXXXXXX……and so on
Put together A+B+C : https://graph.facebook.com/146991996743/posts?fields=likes,comments&limit=20000&since=1356998400&access_token=XXXXX
Step 3 – Build a string to post the same query in browser address bar
![Page 30: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/30.jpg)
Step 4 – Run OpenRefine
1) Run the programme (it opens in your browser)
2) Select Web Addresses
![Page 31: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/31.jpg)
Step 5 – Paste your address into the field
1) Take our query https://graph.facebook.com/146991996743/posts?fields=likes,comments&limit=20000&since=1356998400&access_token=XXXXXXX
2) Paste here
3) Click next
![Page 32: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/32.jpg)
Step 6 – Transform your result
1) Tell the programme that your result is JSON by clicking on „JSON Files“
![Page 33: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/33.jpg)
Step 7 – Pick an individual node !
This is one „like“ on a post made by user Maggu Ka
![Page 34: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/34.jpg)
Step 7 – Behold !
Click on „Create Project“ in the upper left and download data in Excel Sheet
Be sure this does not exceed your
„limit“ in the query, otherwise increase
the limit
![Page 35: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/35.jpg)
Back to Step 3 !
The only thing you need to change is the id – instead of Gambrinus, now try the Pilsner Urquell id
https://www.youtube.com/watch?v=vUxdB-nl0Bw Don‘t remember?
![Page 36: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/36.jpg)
Analysis (sort of)
Note : The metrics chosen could be re- designed to reflect other stuff like time and location
![Page 37: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/37.jpg)
Engagement, like .. ehm,kiwi.. has layers
Sample question : Has my post attracted anyone outside the usual bunch of followers who simply like everything?
Skin : All fans
Inside : Number fans who interact
Core : Fans who interact
regularly
![Page 38: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/38.jpg)
Make crude metrics of those layers
Tip : By messing around with the column named created_time you can see how your core fanbase has been losing and gaining interest in your posts and whether it kept ineracting = compute a lifetime of a fan
Skin : All fans = 100%
Unique Ids within
ineractions / All fans = 7%
Fans with more than 1
interaction / All fans = 2%
![Page 39: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/39.jpg)
Try it with real Gambrinus fanpage data
Tip : What are these ratios among competitors ? Isn‘t that more important than the widely cited number of fans?? Are any of your fans also in the competitors core fanbase? Uhh, you nasty weasels !
47 566 = 100%
2004 unique interactors =
4.2%
575 interactors with more than
1 action = 1.2% (28% of all active fans)
![Page 40: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/40.jpg)
And now the Pilsner Urquell
Tip : What are these ratios among competitors ? Isn‘t that more important than the widely cited number of fans?? Are any of your fans also in the competitors core fanbase? Uhh, you nasty weasels !
204 734= 100%
2358 unique interactors =
1%
715 interactors with more than
1 action = 0.03% (30% of all active fans)
![Page 41: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/41.jpg)
Stand-off revisited. H1 rejected and H2 confirmed
Brand
Number of fans 204 734 47 566
Number of posts in 2013
415 425
Number of active fans in 2013
2358 / 1.1% 2004 / 4.2%
Number of repeated
interactions 715 / 30% of active 575 / 28% of active
Fanbase overlap 5% of active
Variations : Share of all interactions created by the TOP 10% fans..
![Page 42: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/42.jpg)
How to compute average engagement?
1) You may want to try to query the „insights“ table, but mostly no success for pages other than yours
2) Else you need all the posts with likes,comments (and shares) already aggregated
3) Paste this query to OpenRefine like previously and work with Excel sheet from there
https://graph.facebook.com/fql?q=select post_id, like_info,comment_info,share_info from stream where source_id=146991996743 and created_time>1356998400 and actor_id=146991996743 LIMIT 20000&access_token=XXXXX
Tip : Limit the type by adding type in(46,80,128,247) to the where clause so you don‘t get posts like „group created“
![Page 43: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/43.jpg)
Brand
Average engagement
248 74
Median Engagement
144 29
10% Top trimmed average
169 / diff of 79 44 / diff of 30
Stand-off again. H3 rejected
Tip : For more precise information, you may want to exclude the top 5% fans to see how much it changes
This may look surprising, especially considering the active fanbase is more or less equal. Seems like the total fanbase does play a role.
![Page 44: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/44.jpg)
Study competitor‘s top posts
https://www.facebook.com/PilsnerUrquellCzech/posts/10151304524945974
https://www.facebook.com/Gambrinus.cz/posts/10151581664231744
Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
![Page 45: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/45.jpg)
Some conclusions
Followers have a lifespan, some are zombies, some have left Facebook
Large group of active followers is superior to having large zombie fanbase => Facebook edge rank has buried your posts for those guys anyway.
You can make up metrics once you have the data > sometimes better to have the data first
The Graph API returns errors all the time, so don‘t be discouraged..
![Page 46: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/46.jpg)
Step 4 –
• Sum it up
The dogdy part : Know more
about the fans
![Page 47: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/47.jpg)
The fans are well described by their favorites, likes, interests, ...
![Page 48: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/48.jpg)
Facebook ids of fans + Web Scraper
You have facebook id of someone => you can visit her profile
You have a web scraper (like OpenRefine) => you can visit all the profiles without actually browsing throught them
.. And download whatever the browser sees..
It is against the Facebook policies to scrape profile pages en-masse, but its „ok“ as a training excercise.
Pete Warden scraped 200 000 000 FB profiles and they let the lawyers off the leash
http://www.facebook.com/apps/site_scraping_tos_ter
ms.php
![Page 49: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/49.jpg)
Step 2 – Preparing data for Outwit Hub
OutWit Hub is a free intelligent scraper (limited amounts of data)
Prepare the links of Pilsner fans is a notepad file like below and File=> Open the txt. File in Outwit Hub
http://download.cnet.com/OutWit-Hub/3000-11745_4-10846181.html
![Page 50: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/50.jpg)
Step 3 – Creating a scraper in Outwit Hub
Prepare a scraper
1) Go to the „scrapers“ tab
2) Click new
3) Name the scraper somehow
4) Do the rest as below
Get everything starting with --
- and ending with
![Page 51: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/51.jpg)
Step 4 – Running the scraper on a couple of links
![Page 52: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/52.jpg)
Step 5 – Calculate Affinity
Count occurences of individual fanpages in the results and compare them to the occurence in the total czech facebook population of 3 770 000
1) Natural affinity = Total fans of the page / 3 770 000
2) Pilsner affinity = Occurences in results / Fans of Pilsner
3) Affinity ratio = Get the ratio of the two
4) Repeat for all fanpages
5) Bring up those where occurence is the largest
Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
![Page 53: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/53.jpg)
Step 6 – Results (sample)
![Page 54: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/54.jpg)
Step 6 – Troubleshooting
a) Go to Preferences > Time Settings and make sure none of the sliders is „in the red“. That would result in frequent CAPTCHA checks on most protected servers..
b) Make sure your scraper is targeting the right domain
c) Make sure your „Marker Before“ and „Marker After“ are actually present on the page..
d) It is becoming easier to programm an app than try to scrape a meaningful amount of data
![Page 55: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/55.jpg)
Thank you. Now to your questions.
[email protected] www.stemmark.cz Credits for affinity idea : Work by Jan Schmid & Josef Šlerka Images : Photopin.com
![Page 56: DIY basic Facebook data mining](https://reader034.vdocuments.net/reader034/viewer/2022051709/53f460858d7f72a40e8b5386/html5/thumbnails/56.jpg)
Download all materials at :
www.stemmark.cz/downloads/educ/fb_mining.zip
By the way, Mark Zuckerberg likes Pilsner Urquell.