preso on social network analysis for rtp analytics unconference

53
Consolidated Behaviors and Attitudes 1 Analyzing Networks An Overview, and Discussion of Network Analysis (NA) and Social Network Analysis (SNA) Prepared for 2013 AnalyticsCamp: An Annual Unconference , Held in the Research Triangle Park, NC Area, on May 4, 2013 By Bruce Conner Consolidated Behaviors and Attitudes

Upload: bruce-conner

Post on 14-Jun-2015

1.682 views

Category:

Education


1 download

DESCRIPTION

Selected highlights of Coursera Social Networking course, taught by Prof. Lada Adamic of the Univ. of Michigan. Presented at the annual Annual RTP Analytics Unconference, May 4, 2013

TRANSCRIPT

Page 1: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 1

Analyzing Networks

An Overview, and Discussion of Network Analysis (NA) and Social Network Analysis (SNA)

Prepared for 2013 AnalyticsCamp: An Annual Unconference , Held in the Research Triangle Park, NC Area, on May 4, 2013 By Bruce Conner Consolidated Behaviors and Attitudes

Page 2: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 2

Full Disclosure

• I just finished the Social Networking course, on Coursera, taught by Lada Adamic, Assoc. Prof. of Information at the Univ. of Michigan

– All of the content of this deck is derived from that course (not original)

– For purposes of this unconference, I will not be further citing or footnoting this content

Page 3: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 3

My Interest in Social Networking Analysis (SNA)

• Interest in marketing analytics and quantitative market research – Rise of social media and social marketing

– Big data and marketing analytics

– The strengths and weaknesses of behavioral data (Web, mobile, CRM,

transactional, scanner, telemetry, etc.) in marketing applications

• A long-term interest in clustering and segmentation as tools of identifying and targeting of products, services, and messages: can social relationships and social communities enhance this?

• Marketing issues such as: – The role of opinion leaders in influencing brand preferences and purchases of

goods and services – Diffusion of products, services, innovations, brands, preferences, etc. – Formation of preferences for products/services/brands – Targeted marketing to communities and individuals in those communities

Page 4: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 4

Agenda

• Brief introduction to the applications and issues that Social Networking Analysis (SNA) – and, more broadly Network Analysis (NA) -- try to deal with

• Brief overview of some methods, approaches, and statistics involved

• Possible Discussion Topics: – Who is currently using SNA (or NA) -- and what are your

applications? – How (else) might SNA (or NA) be used in your work? – Specifically, how might SNA (or NA) be used in marketing,

product development, or other business applications (or other applications

– Other topics/questions/thoughts?

Page 5: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 5

Quick Overview of SNA Applications

Page 6: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 6

Quick Overview of Applications of SNA: Anti-Terrorism and National Security

Page 7: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 7

A Quick Overview of Applications of SNA (2)

• Anti-terrorism

• Criminal justice

– Conspiracy (e.g., Enron)

– Insider trading

– Fraud

Page 8: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 8

A Quick Overview of Applications of SNA (3)

• Anti-terrorism

• Criminal justice

• Social media

Page 9: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 9

A Quick Overview of Applications of SNA (4)

• Anti-terrorism

• Criminal justice

• Social media

• Gaming –Game (Social) Experience

–Recruitment/virality/engagement/ retention/conversion

Page 10: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 10

And Some More Applications of SNA (5)

• Organizational analysis/ communities of practice

• Marketing based on affiliation

with “communities”

• Inputs to clustering/ segmentation/ profiling

• Biological networks (health care, genomics, etc.)

• Predictive analytics (e.g., predicting improvements in recipes based on ingredient networks)

• Sociology/Economics/ Political Science/etc.

• Computer networks

Page 11: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 11

Kinds of Questions SNA Addresses

Page 12: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 12

Kinds Of Questions that SNA/NA Address

• How do networks form and grow?

– Compare real-world networks (e.g., the Internet, Facebook, biological networks) with various theoretical models • Do the theoretical models help explain the behavior and growth

dynamics of the real network?

• Example: Randomly-formed network vs. “preferential attachment”

Page 13: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 13

Kinds Of Questions that SNA/NA Address (2)

• How does network structure (topology) affect the way that information disseminates -- or that infections spread???

Page 14: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 14

Kinds Of Questions that SNA/NA Address (3)

• Based on the number, strength, directionality, and/or characteristics/attributes of “links,” … and characteristics of individuals/nodes …

… how do we identify (and characterize) communities???

Page 15: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 15

Quick Look at SNA/NA Data

Page 16: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 16

What are networks? • Networks are sets of nodes connected by edges.

“Network” ≡ “Graph”

points lines

vertices edges, arcs math

nodes links computer science

sites bonds physics

actors ties, relations sociology

node

edge

Page 17: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 17

Network elements: edges

• Directed (also called arcs, links) – A -> B

• A likes B, A gave a gift to B, A is B’s child

• Undirected – A <-> B or A – B

• A and B like each other • A and B are siblings • A and B are co-authors

Page 18: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 18

Directed networks

Ada

Cora

Louise

Jean

Helen

Martha

Alice

Robin

Marion

Maxine

Lena

Hazel Hilda

Frances

Eva

Ruth Edna

Adele

Jane

Anna

Mary

Betty

Ella

Ellen

Laura

Irene

• Girls’ school dormitory dining-table partners, 1st and 2nd choices (Moreno, The sociometry reader, 1960)

Page 19: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 19

Example Adjacency Matrix

1

2

3

4 5

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 0 0 0 1

1 1 0 0 0

A =

Page 20: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 20

Graph Data: 2 Tables (Nodes and Edges)

Page 21: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 21

2 Ways that NA is Different From Conventional (Frequentist) Statistics

• Non-independence of “edge rows”: – Example: if I am “linked” to two individuals, it often increases the

probability that they are linked to each other – Implication: one cannot necessarily use statistical tests based on statistical

independence, normal distribution, etc., to understand statistical significance

• Exploration of real-world “graphs” by comparing them to various hypothetical (strawman) models – A Monte Carlo approach:

• Generate large numbers of graphs based on hypothetical models • Compare the various characteristic of real world graph to the

distribution of same characteristics of the multiple hypothetical graphs to test the null hypothesis that the real graph is significantly different than the hypothetical graphs

Page 22: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 22

A Brief Look at Two Topologies

Page 23: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 23

Erdös-Renyi Random Graph: Simplest Network Model

• Assumptions

– Nodes connect at random

– Network is undirected

• Key parameters

– Number of nodes N

– Either “p” or “M” • p = probability that any two nodes share an edge

• M = total number of edges in the graph

Page 24: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 24

What ER Random Networks Look Like

after spring layout

Page 25: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 25

Preferential Attachment Networks

• Preferential attachment of growing networks: – New nodes prefer to attach to well-

connected nodes over less-well connected nodes

• Process also known as – Cumulative advantage

– Rich-get-richer

– Matthew effect

Page 26: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 26

Preferential Growth

Page 27: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 27

A Sample of Network Statistics

Page 28: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 28

Node Statistics • Node network properties

– From immediate connections • indegree

how many directed edges (arcs) are incident on a node

• outdegree how many directed edges (arcs) originate at a node

• degree (in or out) number of edges incident on a node

– From the entire graph • Centrality (betweenness, closeness)

outdegree=2

indegree=3

degree=5

Page 29: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 29

Giant Component • if the largest component encompasses a significant fraction of the graph, it is

called the giant component

Page 30: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 30

average degree

size

of

gian

t co

mp

on

ent

“Percolation Threshold”

av deg = 0.99 av deg = 1.18 av deg = 3.96

Percolation threshold: how many edges need to be added before the giant component appears?

As the average degree increases to z = 1, a giant component suddenly appears

Page 31: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 31

Shortest Path – And Average Shortest Path

• How many hops between two nodes?

• On average, how many hops between each pair of nodes

Page 32: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 32

Centrality

Page 33: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 33

Nodes are sized by degree, and colored by betweenness.

Betweenness: Example

Page 34: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 34

Closeness Example

Y X

Y

X

Y

X

Y

X

Page 35: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 35

Example of Eigenvector Centrality (a Recursive Measure) in Directed Networks

• PageRank brings order to the Web:

– it's not just the pages that point to you, but how many pages point to those pages, etc.

– more difficult to artificially inflate centrality with a recursive definition

Page 36: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 36

Degree Distributions: An Example – With a Log-Log Distribution

• Sexual networks: great variation in contact numbers

Page 37: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 37

Small World Networks

Page 38: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 38

NE

MA

Small world phenomenon: Milgram’s experiment

Page 39: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 39

Ties and Geography

“The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain”

S.Milgram ‘The small world problem’, Psychology TodayM 1967

NE

MA

Page 40: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 40

Kleinberg’s geographical small world model

nodes are placed on a lattice and connect to nearest neighbors

additional links placed with:

p(link between u and v) = (distance(u,v))-r

If you set r = 2, you get optimum ability to get

between nodes with minimal jumps!!!!!

Page 41: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 41

Communities

Page 42: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 42

Why Care About Communities?

• Opinion formation and uniformity

If each node adopts the opinion of the majority of its neighbors, it is possible to have different opinions in different cohesive subgroups

Page 43: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 43

Political Blogs

Page 44: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 44

Community Finding • Social and other networks have a natural community structure

• We want to discover this structure rather than impose a certain size of community or fix the number of communities

• Without “looking”, can we discover community structure in an automated way?

Page 45: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 45

Hierarchical clustering • Process:

– after calculating the “distances” for all pairs of vertices – start with all n vertices disconnected – add edges between pairs one by one in order of

decreasing weight – result: nested components, where one can take a

‘slice’ at any level of the tree

Page 46: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 46

Permuted Adjacency Matrix

Page 47: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 47

Betweenness Clustering • Successively removing edges of highest betweenness (the bridges, or local

bridges) breaks up the network into separate components

Page 48: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 48

Modularity

• Algorithm – Start with all vertices as isolates

– Follow a greedy strategy: • successively join clusters with the greatest increase DQ in modularity

• stop when the maximum possible DQ <= 0 from joining any two

– Successfully used to find community structure in a graph with > 400,000 nodes with > 2 million edges • Amazon’s people who bought this also bought that…

– Alternatives to achieving optimum DQ: • simulated annealing rather than greedy search

Page 49: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 49

Some Interesting Applications of NA

Page 50: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 50

Page 51: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 51

Page 52: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 52

Ingredient Networks

Page 53: Preso on social network analysis for rtp analytics unconference

Consolidated Behaviors and Attitudes 53