“people search, watch, and keep in touch”

Download “People search, watch, and keep in touch”

If you can't read please download the document

Upload: ellery

Post on 25-Feb-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Sue Moon in collaboration with Yong-Yeol Ahn, Meeyoung Cha, Hyunwoo Chun, Seungyeop Han, Haewoon Kwak, Jon Crowcroft, Hawoong Jeong, Pablo Rodriguez. “People search, watch, and keep in touch”. Alexa.com. 2007.6.26. -- Yong-Yeol Ahn. ``People search, watch, and keep in touch”. - PowerPoint PPT Presentation

TRANSCRIPT

Quantitative Analysis of User Behaviors

People search, watch, and keep in touchSue Moon

in collaboration withYong-Yeol Ahn, Meeyoung Cha, Hyunwoo Chun, Seungyeop Han, Haewoon Kwak, Jon Crowcroft, Hawoong Jeong, Pablo Rodriguez

1Alexa.com1Yahoo.com2MSN.com3Google.com4YouTube.com5Live.com6 MySpace.com7Baidu.com8Orkut.com9Wikipedia.org10qq.com2007.6.26.baidu.com google likeqq.com naver like2``People search, watch, and keep in touch

-- Yong-Yeol Ahn3What did we do before Internet?4Remember POTS?POTS = Plain Old Telephone Service5Graham Bells Illustration

6Todays Telephone Network

7People only talked8Predictable Behaviorswhich translates to 9Applicability of same user behavior model over timewhich translates to10Easy planning and Management which translates to11NOW ...12``People search, watch, and keep in touch

-- Yong-Yeol Ahn13Why should computer scientists care?14Why do I care?15``People searchThey submit queries to search enginesQueries reflect collective mind10 most searched keywords

Blog tags also reflect collective mindInfer relations between words from blog tags? [4]

16``People watchNews with still imagesNot watch but browse

VoD (Video On Demand)UCC (User Created Contents) [2]IPTV [3]

17Implications (I)

[5]18Implications (II)Network traffic to grow up to sixfold annuallyCisco CTO

Remember Tech Bubble Burst?

19``People stay in touchEmails and messagesImplicit, not explorable

Social networking servicesExplicit, connection visibleOpportunities for business

20From a computer scientists point of view21``People searchThey submit queries to search enginesQueries reflect collective mind10 most searched keywords

Blog tags also reflect collective mindInfer relations between words from blog tags? [4]

22``People watchNews with still imagesNot watch but browse

VoD (Video On Demand)UCC (User Created Contents) [2]IPTV [3]

23``People stay in touchEmails and messagesImplicit, not explorable

Social networking servicesExplicit, connection visibleOpportunities for business

24I Tube, you tube, everybody tubes

25YouTube SystemLargest VoD for usergenerated contents Founded in Feb 05Some daily statistics - 100M videos served- 65K videos uploaded- 60% of online videos served via YouTube40-50 Gbps bandwidth estimated

26Video Example

OwnerUpload timeRuntimeViewsRatingsStarsCommentsHonorsLinking pages27Content producers, consumers

28Massive files (90%) account for 20% views Small set of files (10%) with 80% of viewsPareto Distribution(max view=8.5M)(max view=2.5M)

(< 1K views)Heavy-tail29Zipf (Power) with exp cutoff

30Popularity Evolution31Age of daily viewed videos

32Watching Television OverNationwide IP Multicast33Quality-assured IPTV architecture

homegatewaySTBPCTV

DSLAM customer premiseTV head end

ISPIP backboneInternet

phone34Internet (1 Mb/s)VoIPIPTV (5 Mb/s)1-2 channelsLast mile(6 Mb/s)

1Gb/s5Mb/s34

Channel holding timeSpikes in histogram: natural long-term off hours?Tipping point in CDF

,Browse View Away3535 Number of viewers over timeTime-of-day effect18% increase in viewing over weekends36

36 Channel popularityTop 10% channels account for 80% viewer shareZipf-like popularity also shown in PPLive37

37 Static vs Dynamic Multicast Trees

cost = 2cost = 1 SourceIP routerDSLAMSTBStaticDynamic3838

Alternate designs for live TV39

Server-based IP multicast

Server-less P2P unicast

Server-basedIP unicastco-existwithinISPHow do these technologies compare?39Example routing

cost = 3cost = 7 TV head endRegionalserverDSLAMSTBCDN Locality-aware P2P Topology-oblivious P2P

40IP routercost = 4

40User ClusteringPeep into life-styles of users using NMF

41

Night Owls25%Always-On50%Early-birds25%Mention we find three-clusters as that was the best heuristic.Besides three, seven was also a good number!41Channel Correlation42233234Docu TVDocumaniaDocumentals65314Nationals2Tele 5Tve 1Antena 3CuatroLa SextaTve 24243Trace TVMTV BaseMusic2Movies116118MGMExtreme TV11111240 LatinoSol musica11040 TVMusic1Analsys of huge online social networking services43CyWorld44MySPACE45Orkut46Online Social Networking ServicesPortal for people to Stay in touch with friendsShare photos and personal newsFind others of common interestsEstablish a forum for discussion4747CyWorldLargest SNS in South KoreaStarted in September 200110 million users in 200416 million users out of 48 million populationFront runner of many featuresFriend (il-chon) relationship GuestbookTestimonial (il-chon-pyung)Photos - scrapsAvatar in cyber home

4848My CyWorld Mini-Homepage49

49CyWorld Data SetsComplete snapshot (Nov 2005)191 million friend relationships between 12 million usersTwo additional snapshots (Apr/Sep 2005)

5050MySpace Data SetLargest in the worldBegan in Jul 2003Has 130 million by Nov 2006

Snowball sampledDuring Sep/Oct 2006Random seed to 100,000 usersAbout 23% of users had friend list hidden

5151Orkut Data SetGoogle SNSBegan in Sep 2002Became official Google service in Jan 2004Began as invitation-only; open nowHas 33 million users

Snowball sampledDuring Jun to Sep 2006100,000 users

5252Metrics of InterestDegree distributionPower-law Small number of nodes have large numbers of linksClustering coefficient C(k)# of existing links / # of all possible links between a links adjacent neighborsClose to 1, close to a meshDegree correlation knnDegree k ~ mean degree of adjacent neighbors of nodes with degree kAssortativity: characteristic of knn distribution5353Assortative Mixing54

M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002) Socialnon-social+-

degreeassortative54Questions We RaiseWhat are the main characteristics of online SNSs?How representative is a sample network?How does a social network evolve?

5555Historical Analysis56

56Degree Distribution57

Figure 1-(a): degree distribution, CCDFTwo scaling regions57

Clustering Coefficient Distribution5858Degree Correlation59

Not assortative59

Average Path Length60< 5 is about 90%60

Evolution of Degree Distributions61Two kinds of driving force61

Evolution of Path Length62Start of densification?62How about myspace and orkut?63

Degree Distributions6464What did we learn?65CYWORLD is saturated66but continues to grow67myspace fast growing thru cyber-only relationships68POINTs to POnDER69Ease of data collection70Complete data rather than sampled set71Am I asking all the questions?72Or are there many more?73``People search, watch, and keep in touch

-- Yong-Yeol Ahn74Alexa.com1Yahoo.com2MSN.com3Google.com4YouTube.com5Live.com6 MySpace.com7Baidu.com8Orkut.com9Wikipedia.org10qq.com2007.6.26.baidu.com google likeqq.com naver like75Web N.0: What sciences will it take? -- Prabhakar Raghavan76Where do I go from here?77References[1] Ahn et al., Analysis of Topological Characteristics of Huge Online Social Networks, WWW 2007[2] Cha et al., I tube, you tube, everybody tubes: analyzing the worlds largest user generated content video system, ACM SIGCOMM IMC 2007 (best paper award)[3] Cha et al., Watching television over nationwide IP multicast under submission[4] Kwak et al., Constructing word relationships from tags in preparation[5] Willinger et al., Scaling phenomena in the Internet: Critically examining criticality, PNAS, vol 99, suppl. 1

78