preso on social network analysis for rtp analytics unconference
DESCRIPTION
Selected highlights of Coursera Social Networking course, taught by Prof. Lada Adamic of the Univ. of Michigan. Presented at the annual Annual RTP Analytics Unconference, May 4, 2013TRANSCRIPT
Consolidated Behaviors and Attitudes 1
Analyzing Networks
An Overview, and Discussion of Network Analysis (NA) and Social Network Analysis (SNA)
Prepared for 2013 AnalyticsCamp: An Annual Unconference , Held in the Research Triangle Park, NC Area, on May 4, 2013 By Bruce Conner Consolidated Behaviors and Attitudes
Consolidated Behaviors and Attitudes 2
Full Disclosure
• I just finished the Social Networking course, on Coursera, taught by Lada Adamic, Assoc. Prof. of Information at the Univ. of Michigan
– All of the content of this deck is derived from that course (not original)
– For purposes of this unconference, I will not be further citing or footnoting this content
Consolidated Behaviors and Attitudes 3
My Interest in Social Networking Analysis (SNA)
• Interest in marketing analytics and quantitative market research – Rise of social media and social marketing
– Big data and marketing analytics
– The strengths and weaknesses of behavioral data (Web, mobile, CRM,
transactional, scanner, telemetry, etc.) in marketing applications
• A long-term interest in clustering and segmentation as tools of identifying and targeting of products, services, and messages: can social relationships and social communities enhance this?
• Marketing issues such as: – The role of opinion leaders in influencing brand preferences and purchases of
goods and services – Diffusion of products, services, innovations, brands, preferences, etc. – Formation of preferences for products/services/brands – Targeted marketing to communities and individuals in those communities
Consolidated Behaviors and Attitudes 4
Agenda
• Brief introduction to the applications and issues that Social Networking Analysis (SNA) – and, more broadly Network Analysis (NA) -- try to deal with
• Brief overview of some methods, approaches, and statistics involved
• Possible Discussion Topics: – Who is currently using SNA (or NA) -- and what are your
applications? – How (else) might SNA (or NA) be used in your work? – Specifically, how might SNA (or NA) be used in marketing,
product development, or other business applications (or other applications
– Other topics/questions/thoughts?
Consolidated Behaviors and Attitudes 5
Quick Overview of SNA Applications
Consolidated Behaviors and Attitudes 6
Quick Overview of Applications of SNA: Anti-Terrorism and National Security
Consolidated Behaviors and Attitudes 7
A Quick Overview of Applications of SNA (2)
• Anti-terrorism
• Criminal justice
– Conspiracy (e.g., Enron)
– Insider trading
– Fraud
Consolidated Behaviors and Attitudes 8
A Quick Overview of Applications of SNA (3)
• Anti-terrorism
• Criminal justice
• Social media
Consolidated Behaviors and Attitudes 9
A Quick Overview of Applications of SNA (4)
• Anti-terrorism
• Criminal justice
• Social media
• Gaming –Game (Social) Experience
–Recruitment/virality/engagement/ retention/conversion
Consolidated Behaviors and Attitudes 10
And Some More Applications of SNA (5)
• Organizational analysis/ communities of practice
• Marketing based on affiliation
with “communities”
• Inputs to clustering/ segmentation/ profiling
• Biological networks (health care, genomics, etc.)
• Predictive analytics (e.g., predicting improvements in recipes based on ingredient networks)
• Sociology/Economics/ Political Science/etc.
• Computer networks
Consolidated Behaviors and Attitudes 11
Kinds of Questions SNA Addresses
Consolidated Behaviors and Attitudes 12
Kinds Of Questions that SNA/NA Address
• How do networks form and grow?
– Compare real-world networks (e.g., the Internet, Facebook, biological networks) with various theoretical models • Do the theoretical models help explain the behavior and growth
dynamics of the real network?
• Example: Randomly-formed network vs. “preferential attachment”
Consolidated Behaviors and Attitudes 13
Kinds Of Questions that SNA/NA Address (2)
• How does network structure (topology) affect the way that information disseminates -- or that infections spread???
Consolidated Behaviors and Attitudes 14
Kinds Of Questions that SNA/NA Address (3)
• Based on the number, strength, directionality, and/or characteristics/attributes of “links,” … and characteristics of individuals/nodes …
… how do we identify (and characterize) communities???
Consolidated Behaviors and Attitudes 15
Quick Look at SNA/NA Data
Consolidated Behaviors and Attitudes 16
What are networks? • Networks are sets of nodes connected by edges.
“Network” ≡ “Graph”
points lines
vertices edges, arcs math
nodes links computer science
sites bonds physics
actors ties, relations sociology
node
edge
Consolidated Behaviors and Attitudes 17
Network elements: edges
• Directed (also called arcs, links) – A -> B
• A likes B, A gave a gift to B, A is B’s child
• Undirected – A <-> B or A – B
• A and B like each other • A and B are siblings • A and B are co-authors
Consolidated Behaviors and Attitudes 18
Directed networks
Ada
Cora
Louise
Jean
Helen
Martha
Alice
Robin
Marion
Maxine
Lena
Hazel Hilda
Frances
Eva
Ruth Edna
Adele
Jane
Anna
Mary
Betty
Ella
Ellen
Laura
Irene
• Girls’ school dormitory dining-table partners, 1st and 2nd choices (Moreno, The sociometry reader, 1960)
Consolidated Behaviors and Attitudes 19
Example Adjacency Matrix
1
2
3
4 5
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 0 0 0 1
1 1 0 0 0
A =
Consolidated Behaviors and Attitudes 20
Graph Data: 2 Tables (Nodes and Edges)
Consolidated Behaviors and Attitudes 21
2 Ways that NA is Different From Conventional (Frequentist) Statistics
• Non-independence of “edge rows”: – Example: if I am “linked” to two individuals, it often increases the
probability that they are linked to each other – Implication: one cannot necessarily use statistical tests based on statistical
independence, normal distribution, etc., to understand statistical significance
• Exploration of real-world “graphs” by comparing them to various hypothetical (strawman) models – A Monte Carlo approach:
• Generate large numbers of graphs based on hypothetical models • Compare the various characteristic of real world graph to the
distribution of same characteristics of the multiple hypothetical graphs to test the null hypothesis that the real graph is significantly different than the hypothetical graphs
Consolidated Behaviors and Attitudes 22
A Brief Look at Two Topologies
Consolidated Behaviors and Attitudes 23
Erdös-Renyi Random Graph: Simplest Network Model
• Assumptions
– Nodes connect at random
– Network is undirected
• Key parameters
– Number of nodes N
– Either “p” or “M” • p = probability that any two nodes share an edge
• M = total number of edges in the graph
Consolidated Behaviors and Attitudes 24
What ER Random Networks Look Like
after spring layout
Consolidated Behaviors and Attitudes 25
Preferential Attachment Networks
• Preferential attachment of growing networks: – New nodes prefer to attach to well-
connected nodes over less-well connected nodes
• Process also known as – Cumulative advantage
– Rich-get-richer
– Matthew effect
Consolidated Behaviors and Attitudes 26
Preferential Growth
Consolidated Behaviors and Attitudes 27
A Sample of Network Statistics
Consolidated Behaviors and Attitudes 28
Node Statistics • Node network properties
– From immediate connections • indegree
how many directed edges (arcs) are incident on a node
• outdegree how many directed edges (arcs) originate at a node
• degree (in or out) number of edges incident on a node
– From the entire graph • Centrality (betweenness, closeness)
outdegree=2
indegree=3
degree=5
Consolidated Behaviors and Attitudes 29
Giant Component • if the largest component encompasses a significant fraction of the graph, it is
called the giant component
Consolidated Behaviors and Attitudes 30
average degree
size
of
gian
t co
mp
on
ent
“Percolation Threshold”
av deg = 0.99 av deg = 1.18 av deg = 3.96
Percolation threshold: how many edges need to be added before the giant component appears?
As the average degree increases to z = 1, a giant component suddenly appears
Consolidated Behaviors and Attitudes 31
Shortest Path – And Average Shortest Path
• How many hops between two nodes?
• On average, how many hops between each pair of nodes
Consolidated Behaviors and Attitudes 32
Centrality
Consolidated Behaviors and Attitudes 33
Nodes are sized by degree, and colored by betweenness.
Betweenness: Example
Consolidated Behaviors and Attitudes 34
Closeness Example
Y X
Y
X
Y
X
Y
X
Consolidated Behaviors and Attitudes 35
Example of Eigenvector Centrality (a Recursive Measure) in Directed Networks
• PageRank brings order to the Web:
– it's not just the pages that point to you, but how many pages point to those pages, etc.
– more difficult to artificially inflate centrality with a recursive definition
Consolidated Behaviors and Attitudes 36
Degree Distributions: An Example – With a Log-Log Distribution
• Sexual networks: great variation in contact numbers
Consolidated Behaviors and Attitudes 37
Small World Networks
Consolidated Behaviors and Attitudes 38
NE
MA
Small world phenomenon: Milgram’s experiment
Consolidated Behaviors and Attitudes 39
Ties and Geography
“The geographic movement of the [message] from Nebraska to Massachusetts is striking. There is a progressive closing in on the target area as each new person is added to the chain”
S.Milgram ‘The small world problem’, Psychology TodayM 1967
NE
MA
Consolidated Behaviors and Attitudes 40
Kleinberg’s geographical small world model
nodes are placed on a lattice and connect to nearest neighbors
additional links placed with:
p(link between u and v) = (distance(u,v))-r
If you set r = 2, you get optimum ability to get
between nodes with minimal jumps!!!!!
Consolidated Behaviors and Attitudes 41
Communities
Consolidated Behaviors and Attitudes 42
Why Care About Communities?
• Opinion formation and uniformity
If each node adopts the opinion of the majority of its neighbors, it is possible to have different opinions in different cohesive subgroups
Consolidated Behaviors and Attitudes 43
Political Blogs
Consolidated Behaviors and Attitudes 44
Community Finding • Social and other networks have a natural community structure
• We want to discover this structure rather than impose a certain size of community or fix the number of communities
• Without “looking”, can we discover community structure in an automated way?
Consolidated Behaviors and Attitudes 45
Hierarchical clustering • Process:
– after calculating the “distances” for all pairs of vertices – start with all n vertices disconnected – add edges between pairs one by one in order of
decreasing weight – result: nested components, where one can take a
‘slice’ at any level of the tree
Consolidated Behaviors and Attitudes 46
Permuted Adjacency Matrix
Consolidated Behaviors and Attitudes 47
Betweenness Clustering • Successively removing edges of highest betweenness (the bridges, or local
bridges) breaks up the network into separate components
Consolidated Behaviors and Attitudes 48
Modularity
• Algorithm – Start with all vertices as isolates
– Follow a greedy strategy: • successively join clusters with the greatest increase DQ in modularity
• stop when the maximum possible DQ <= 0 from joining any two
– Successfully used to find community structure in a graph with > 400,000 nodes with > 2 million edges • Amazon’s people who bought this also bought that…
– Alternatives to achieving optimum DQ: • simulated annealing rather than greedy search
Consolidated Behaviors and Attitudes 49
Some Interesting Applications of NA
Consolidated Behaviors and Attitudes 50
Consolidated Behaviors and Attitudes 51
Consolidated Behaviors and Attitudes 52
Ingredient Networks
Consolidated Behaviors and Attitudes 53