followers mutual friendscis.csuohio.edu/~sschung/cis612/twitterpresentationadamryan.pdf · develop...

26
MUTUAL FRIENDS FOLLOWERS By Adam Kuns & Ryan Chesla

Upload: others

Post on 28-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

MUTUAL FRIENDSFOLLOWERS

By Adam Kuns & Ryan Chesla

Page 2: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

The Unified Logging Infrastructure for Data Analytics

1. Introduction - logging based on sessions2. Scribe/Zookeeper - takes data from web servers and inputs into HDFS. - per category, per hour directory. ie:

/logs/category/YYYY/MM/DD/HH)3. Motivation - different apps had different schemas for logging, making it hard to query. A unified format fixed this.

-

Page 3: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

The Unified Logging Infrastructure for Data Analytics

4. Data logged by client events: (client, page, section, component, element, action) (client, page, section, component, *, action) (client, page, section, *, *, action) (client, page, *, *, *, action)

Oink schedules common query jobs in advance (counts)

5. Applications: Summary statistics, user modeling, funnel analytics, event counting

-

Page 4: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Original Proposal

Develop a system to find “mutual friends” between users of Facebook using Hadoop and MapReduce.

Page 5: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Issues with Graph API 2.0

As of version 2.0 of Graph, the friends object only returns that person's friends who also use the app.

Since none of our friends use this app, the returned objects from Facebook were empty, making our original proposal not feasible.

Page 6: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Possible WorkaroundsWe did find some workarounds, but were not feasible for the scope of this project:

● The first option would require accessing the TaggableFriends object, but access to this object requires Facebook App approval.

● The second would be to classify that our app was a canvas app so that we could access the game invites list for Facebook games (basically lie, saying our App was a Facebook game).

Page 7: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

New Proposal

● Develop a system to find “mutual friends followers” between users of Facebook Twitter using Hadoop and MapReduce.

● In addition to our original proposal, we decided to store user data on initial pulls into HBase also, so that we can pull user information later based on our data mining MapReduce job.

Page 8: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Twitter Search API

● For our programs, utilizing Twitter’s Search API fitted our needs where we could query specific user’s information.

● The Streaming API would be more suited for tweets.

● Drawback: We would often run into the request limit, this would be solved by using the Firehouse API.

Page 9: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Populating the HDFS

In order to populate the HDFS with data from Twitter, we decided to use Twitter4J, an unofficial Java library for the Twitter API.

This provided an easy-to-use, object-oriented approach to pulling data into the HDFS by way of a Java program.

Page 10: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Twitter4J API● Using Twitter4J, we can pull user specific information on a per user basis.● In our case, we use the showUser function to create a User object for the

user we are currently querying.● Using this User object, we can call many functions to pull multiple details

about a particular user.

Page 11: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Populating the HDFS (cont.)

Once we have the user information (both user’s and their followers’ information), we create a flat file for the two users’ friends list and populate HBase with information for the two users and all of their followers.

Page 12: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

MapReduce Input File Example

line 1: 123 574 234 920 984line 2: 658 997 111 322 123 125

= User ID

= Follower ID

Page 13: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

MapReduce Input File

Page 14: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

MapReduce AlgorithmOur MapReduce program “Mutual Followers” reads this text file from the HDFS.

Each line in the text file represents a user’s follower list.

The lines consist of user ID’s, tab delimited. The first ID is the user.

Page 15: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

MapReduce Split Phase

Splits input file by each new line in the text.

Done implicitly by MapReduce (no programmer intervention)

Each line represents a person and their followers

Page 16: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Mapper FunctionThe mapper function tokenizes each line, outputting key value pairs where:

Key = followerIDValue = 1

Note: That the userID’s of the users we are querying are also emitted as a key value pair, in the case that the users we querying follow each other.

{key, value} = {followerID, 1}123 1456 1999 1

Page 17: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Mapper Code

Page 18: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Reducer FunctionAggregates key value pairs. If the sum of a value given a certain key is equal to the number of users we’re comparing (in our case 2), then we output that follow ID.

if (count == 2)//output follower ID

*note: our program will work with comparing any number of followers, just need to change the if statement accordingly (ie. if (count == 100))

Page 19: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Reducer Code

Page 20: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

MapReduce Output

Below is the output from the MapReduce job, returning the user id’s that are mutual followers

Page 21: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

HBase

The map-reduce job provides mutual user_ids

From there we can use those ids to index into our HBASE system and get real user data!

Page 22: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Populating HBaseCircling back to our first program, along with creating the flat file for the MapReduce program, we also inserted the user information and their followers’ information into HBase at the same time using the HBase Java API.

Page 23: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

HBase Data Model

Row ID = user_ID + timestamp (since user ID is unique)

Two Column Families:1. User Info (name, profile, picture,etc)

- All this user info is related and would usually be queried at the same time, so they will be grouped into the same column family

2. Followers

Page 24: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

HBase StructureRow ID Time Stamp ColumnFamily User Info ColumnFamily Followers

ajkuns t3 userInfo:id = “123”userInfo:name = “ajkuns”userInfo:bio = “hello i am adam”userInfo:profilePic = “http://www.cute_kittens.jpg”

Followers:1 = “123”Followers:2 = “456”Followers:3 = “999”Followers:4 = “11”

RyanChesla_ t4 userInfo:id = “456”userInfo:name = “RyanChesla_”userInfo:bio = “hi everybody!”userInfo:profilePic = “http://www.volcano.jpg”

Followers:1 = “9”Followers:2 = “599”

Note: the number of followers per user can vary so we took that into consideration

Page 25: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

HBase Contents

scan ‘TwitterUser’

Page 26: FOLLOWERS MUTUAL FRIENDScis.csuohio.edu/~sschung/cis612/TwitterPresentationAdamRyan.pdf · Develop a system to find “mutual friends followers” between users of Facebook Twitter

Future Work

Go N levels deep of followers

(find mutual followers of your followers’ followers!)