oxford digital humanities summer school
DESCRIPTION
SlidesTRANSCRIPT
(Social) Network Analysis
Scott A. HaleOxford Internet Institute
http://www.scotthale.net/
17 July 2014
What are networks?
Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)
Additional details
Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.
Edges can be directed or undirected andweighted or unweighted
Additionally, networks may be multilayerand/or multimodal.
What are networks?
Networks (graphs) are set of nodes (verticies) connected by edges (links,ties, arcs)
Additional details
Whole vs. ego: whole networks have allnodes within a natural boundary(platform, organization, etc.). An egonetwork has one node and all of itsimmediate neighbors.
Edges can be directed or undirected andweighted or unweighted
Additionally, networks may be multilayerand/or multimodal.
Why?
Characterize network structure
How far apart / well-connected are nodes?Are some nodes at more important positions?Is the network composed of communities?
How does network structure affect processes?
Information diffusionCoordination/cooperationResilience to failure/attack
A network
First questions when approaching a network
What are edges? What are nodes?
What kind of network?
Inclusion/exclusion criteria
Network data repositories
http://www.diggingintodata.org/Repositories/tabid/167/
Default.aspx
http://datamob.org
http://snap.stanford.edu/data
http://www-personal.umich.edu/~mejn/netdata
Python resources
tweepy: Package for Twitter stream and search APIs (only python 2.7 atthe moment)
search and stream API example code along with code to creatementions/retweet network athttps://github.com/computermacgyver/twitter-python
Python two versions:
2.7.x – many packages, issues with non-English scripts
3.x – less packages, but excellent handling of international scripts(unicode)
NetworkX
http://networkx.github.io/
Package to represent networks as python objects
Convenient functions to add, delete, iterate nodes/edges
Functions to calculate network statistics (degree, clustering, etc.)
Easily generate comparison graphs based on statistical models
Visualization
Alternatives include igraph (available for Python and R)
Gephi
Open-source, cross-platform GUI interface
Primary strength is to visualize networks
Basic statistical properties are also available
Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more
Network measures
With many nodes visualizations are often difficult/impossible to interpret.Statistical measures can be very revealing, however.
Node-level
Degree (in, out): How many incoming/outgoing edges does a node have?Centrality (next slide)Constraint
Network-level
Components: Number of disconnected subsets of nodesDensity: observed edges
maximum number of edges possible
Clustering coefficient closed tripletsconnected triples
Path length distributionDistributions of node-level measures
Centrality measures
Degree
Closeness: Measures the average geodesic distance to ALL other nodes.Informally, an indication of the ability of a node to diffuse a propertyefficiently.
Betweenness: Number of shortest paths the node lies on. Informally,the betweenness is high if a node bridges clusters.
Eigenvector: A weighted degree centrality (inbound links from highlycentral nodes count more).
PageRank: Not strictly a centrality measure, but similar to eigenvectorbut modeled as a random walk with a teleportation parameter
NetworkX: Nodes
import networkx as nx
g=nx.Graph() #A new (empty) undirected graph
g.add_node("Alan") #Add one new node
g.add_nodes_from(["Bob","Carol","Denise"])#Add three new nodes from list
#Nodes can have attributes
g.node["Alan"]["gender"]="M"
g.node["Bob"]["gender"]="M"
g.node["Carol"]["gender"]="F"
g.node["Denise"]["gender"]="F"
for n in g:
print("{0} has gender {1}".format(n,g.node[n]["gender"]))
NetworkX: Edges
#Interesting graphs have edges
g.add_edge("Alan","Bob") #Add one new edge
#Add two new edges
g.add_edges_from([["Carol","Denise"],["Carol","Bob"]])
#Edge attributes
g.edge["Alan"]["Bob"]["relationship"]="Friends"
g.edge["Carol"]["Denise"]["relationship"]="Friends"
g.edge["Carol"]["Bob"]["relationship"]="Married"
#New edge with an attribute
g.add_edges_from([["Carol","Alan",
{"relationship":"Friends"}]])
NetworkX: Edges
for e in g.edges_iter():
n1=e[0]
n2=e[1]
print("{0} and {1} are {2}".format(n1,n2,g.edge[n1][n2]["relationship"]))
NetworkX: Measures
g.number_of_nodes()
g.nodes(data=True)
g.number_of_edges()
g.edges(data=True)
nx.info(g)
nx.density(g)
nx.number_connected_components(g)
nx.degree_histogram(g)
nx.betweenness_centrality(g)
nx.clustering(g)
nx.clustering(g, nodes=["Bob"])
NetworkX: Visualize or save
#Save g to the file my_graph.graphml in graphml format
#prettyprint will make it nice for a human to read
nx.write_graphml(g,"my_graph.graphml",prettyprint=True)
#Layout g with the Fruchterman-Reingold force-directed
#algorithm and save the result to my_graph.png
#with_labels will label each node with its id
import matplotlib.pyplot as plt
nx.draw_spring(g,with_labels=True)
plt.savefig("my_graph.png")
plt.clf() #Clear plot
NetworkX: Odds and ends
#Read a graph from the file my_graph.graphml in graphml format
g=nx.read_graphml("my_graph.graphml")
#Create a (empty) directed graph
g=nx.DiGraph()
See http://networkx.github.io/documentation/latest/reference/
index.html for many more commands. Note that some commands are onlyavailable on directed or undirected graphs.
Resources
Newman, M.E.J., Networks: An Introduction
Kadushin, C., Understanding Social Networks: Theories, Concepts, andFindings
De Nooy, W., et al., Exploratory Social Network Analysis with Pajek
Shneiderman B., and Smith, M., Analyzing Social Media Networks withNodeXL
(Social) Network Analysis
Scott A. HaleOxford Internet Institute
http://www.scotthale.net/
17 July 2014