community detection and political polarization (daniel argyle)
Post on 01-Jan-2017
220 Views
Preview:
TRANSCRIPT
Networks, Community Detection, and PoliticalPolarization∗
Daniel ArgyleUniversity of California, Santa Barbara
danielargyle@umail.ucsb.edu
February 14, 2013
Abstract
I propose a Bayesian algorithm that identifies latent groups, known as communities, in net-works. This algorithm is more general than previous community detection techniques intro-duced in economics and incorporates insights from Bayesian statistics that allow it to convergemore quickly and be implemented on relatively large networks. I demonstrate these attributeson randomly generated networks and provide an application to the United States Congress,where the proposed model and parameters have an intuitive interpretation.
JEL Classification Codes: C12, C49, C52, D72Keywords: networks, community structures, political networks, Bayesian methods
∗This is a working draft. Please do not cite or distribute without permission.
1
1 Introduction
Many areas of economics—including basic questions of public good provision, externalities, and
trade—are built on theories of repeated interactions of individuals in groups. Unfortunately, in
many settings an agent’s actions can be observed but group membership is unobserved. It may
also be the case that agents associate themselves with a given label, but their chosen designation
does not accurately reflect their revealed choices. Identifying actual group membership is a key
empirical concern when addressing questions that involve interaction within groups (e.g. peer
effects) and interactions across groups (e.g. externalities). In such scenarios may be feasible to
estimate group membership based on other observable data. One method, known as community
detection, uses network links to find sets of agents that interact more frequently and is rapidly
growing in prominence. This paper contributes a Bayesian community detection technique that is
based on a more general framework than existing methods in economics and utilizes insights from
Bayesian statistics that ensure that the algorithm is feasible to estimate.
Copic, Jackson and Kirman (2009) introduce community detection in the economics litera-
ture.1 Like other similar work (see Newman (2004)), Copic et al. (2009) define communities as
groups of nodes in a network where the probability of a link forming between two nodes in the
same community exceeds the probability that a link forms between two nodes in different com-
munities. Copic et al. (2009) then establish a function that yields the likelihood of observing a
network of interactions given the probabilities of link formation and a community structure (a par-
tition of the network nodes into individual communities) and optimize it via an iterative process
to find the maximum likelihood estimates of the probabilities and the optimal community struc-
ture. Subsequent work by Chapman and Zhang (2010) offers a basic Bayesian implementation of
this framework suggested by Copic et al. (2009), which builds on the same likelihood function
but estimates the parameters via a Markov-Chain Monte Carlo Algorithm. I make two principle
1There is a robust literature about community detection driven by research in physics, computer science, statisticsand other fields. Leskovec, Lang and Mahoney (2010) and Santo and Fortunato (2010) provide brief introductions toseveral of the most common community detection methods along with performance benchmarks; however, it must beemphasized that this is a widely varied and rapidly growing literature. Mørup and Schmidt (2012) is a good exampleof a state of the art Bayesian algorithm.
2
contributions to this previous work. First, I generalize the likelihood function found in Copic et al.
(2009) by loosening the restrictive assumption that the probability of link formation be identical
across all groups. Not only is the generalized likelihood function more plausible, because it allows
groups of individuals behave differently, I demonstrate that it results in more accurate estimation
through Bayesian Model Comparison. Second, I propose a new algorithm to estimate the more
general model. Chapman and Zhang (2010) propose an algorithm that relies on moving one node
to a new community and evaluating which partition is more likely given the previous estimates
of the probabilities. This process is very slow for large networks and when applied to the gen-
eralized model often will not converge. I propose transitioning by randomly selecting two nodes
and then merging their communities (if they lie in different groups) or splitting the community
through random assignment (if the nodes are in the same community); this is an insight based on
Jain and Neal (2004). Additionally, unlike similar Bayesian algorithms from other fields, I assume
an uninformative prior over the set of possible community groups.
Despite it’s potential usefulness, applications of community detection in economics remain
scarce. Copic et al. (2009) includes an application to citation networks, where the community
detection algorithm finds communities of journals that align with subfields in economics. Sev-
eral recent papers use community detection algorithms to identify groups of closely related finan-
cial institutions (Wetherilt, Zimmerman and Soramaki, 2010; Chapman and Zhang, 2010; Bech,
Bergstrom, Garratt and Rosvall, 2011). To demonstrate my algorithm, I contribute an application
of community detection in congressional voting that provides insights into changes in strategic
voting patterns in the United States Congress.
I introduce notation in Section 2. The specific Bayesian algorithm I develop is described in
Section 3, and Section 4 discusses simulation results demonstrating convergence of this technique
under a variety of parametric specifications. Section 5 contains the results of the algorithm when
applied to data from the United States Senate and suggests that Congressional voting has become
more partisan in recent years. Section 6 concludes.
3
2 Networks and Community Detection
A network consists of a set of nodes N = {1,2, . . . ,n} and an n× n adjacency matrix, A, of non-
negative integers representing links between the nodes. An individual element of the adjacency
matrix, Ai j, indicates the number of times agent i and agent j interact.2 A community structure z is
a vector of length n that contains a community assignment for each node, i.e. zi is the community
assignment of node i. Note that this implies that a given node can only belong to one community.
The total number of communities is denoted by L and will be indexed by ` so that z(`) is the set of
nodes assigned to a given community `. For a given community ` ∈ {1,2, . . . ,L}, the probability
of forming a link to a node in the same community is denoted pin` and the probability of forming
a link to a node in any other community is denoted pout` .3 This structure is more general than pre-
vious work by Copic et al. (2009) and Chapman and Zhang (2010) because both assume identical
probabilities for link formation across all communities; that model is nested as a special case.
A community structure requires the assumption that probability of forming links within a com-
munity exceeds the probability of forming links across communities. This implies the following
condition must hold
0≤ pout` ≤ pin
` ≤ 1 (1)
for all ` ∈ 1,2, . . . ,L. The number of interactions between a given node i in community ` and
another node j is assumed to follow a binomial distribution so that link between the nodes is
distributed
p(Ai j|pin` , pout
` ,z,Ci j) =
(
Ci j
Ai j
)(pin
` )Ai j(1− pin
` )Ci j−Ai j , if zi = z j(
Ci j
Ai j
)(pout
` )Ai j(1− pout` )Ci j−Ai j , if zi 6= z j
(2)
where the the correct case is determined by node j belonging to the same community as node
2While it is often the case that the adjacency matrix is symmetric (i.e. that Ai j = A ji), this does not have to be thecase. For example, in a social network defined by email communication, an email from person i to person j does notensure that person j responds, a situation with which most people are familiar.
3I assume that the probability of forming a link outside of a community is the same regardless of the community towhich the link is formed. While it is possible to posit a fully general model that allows pout
` to vary across groups(seeMørup and Schmidt (2012)) this can result in estimating more parameters than there is data when there are manycommunities.
4
i (zi = z j) or if the two nodes lie in different groups (zi 6= z j). The assumption of a binomial
distribution requires additional information, namely the number of trials for which one can observe
a successful outcome. This information is contained in an n× n capacity matrix C, where Ci j
contains the number of potential interactions between nodes i and j (note that Ci j ≥ Ai j for all i∈N
and j ∈ N).4 This assumption is shared with work by the original model in Copic et al. (2009) and
is most suited to cases where the number of potential interactions is known; however, the binomial
assumption is robust up to a reasonable estimate of Ci j as is discussed in the citation network
example in their work. For convenience, I will proceed assuming that C is known. Combining and
aggregating the binomial probabilities (and omitting the binomial coefficients which are functions
of only the data) I obtain the likelihood of observing a community structure given a network and
probabilities:5
L(z|pin` ,p
out` ,A,C) =
n
∏i=1
∏j∈z(`)
(pin` )
Ai j(1− pin` )
Ci j−Ai j ∏j 6∈z(`)
(pout` )Ai j(1− pout
` )Ci j−Ai j (3)
where node i lies in community `, pin` represents the set of all within community probabilities and
pout` is the set of all across community probabilities.
The Beta distribution is a continuous distribution on the interval [0,1] that can take a variety of
shapes depending on the parameterization. In Bayesian analysis it is the natural conjugate prior of
the Bernoulli, binomial, and geometric distributions and has the advantage of nesting the uniform
distribution as a special case when both parameters are set to one. Consequently, I assume a Beta
prior for the probabilities for all ` ∈ L:
pin` ∼ Beta(α in,β in)
pout` ∼ BetaInc(αout ,β out , pin
` )
4Similarly to the adjacency matrix, the capacity matrix is frequently symmetric but is not constrained to be so.5Note that aggregating in this fashion requires assuming that interactions between individuals are independent
conditional on community assignment. This is a less restrictive than assuming that the probabilities are independentacross all individuals (as is required in Copic et al. (2009) and Chapman and Zhang (2010)) and is based on the ideathat conditional on group assignment link formation appears to happen at random; however, there are scenarios wherethis assumption may not be innocuous. This will be discussed further in Section 5
5
where BetaInc represents a beta distribution truncated by [0, pin` ]. The density for this function is
F(θ) =θ α−1(1−θ)β−1
Bpin`(α,β )
(4)
where Bpin`(·) is the incomplete beta function. This prior assumes that the community definition
assumption in Equation 1 is met. For convenience in notation, I assume that the prior parameters
are the same for both distributions so that α in = αout = α and β in = β out = β . In addition to
assuming prior distributions for the probabilities, I assume an discrete uniform prior for the com-
munity structures. The number of potential partitions of a set of size n is a constant known as the
Bell Number, denoted ωn, which implies that the prior probability mass function is given by:6
f (z) =1
ωn
This represents an uninformative prior because it implies that all partitions of the nodes are equally
likely.7 Because this value is constant given the number of nodes in the network I omit it throughout
for clarity.
While it is feasible to estimate the parameters in (3) via direct maximum likelihood estimation
as in Copic et al. (2009), there are several reasons this may be undesirable. It is important to note
that it is very difficult to optimize over the space of potential community structures. As networks
become large the number of potential partitions increases dramatically factorially. For example, a
set of size 10 has 115,975 possible partitions while a set of size 25 has 4,638,590,332,330,743,949.
This makes it difficult, if not impossible, to calculate the likelihood for all possible partitions
for a network of any reasonable size. Further, the set of all possible partitions is discrete and
unordered, meaning that there is no information about direction of increase that is typically used
in optimization algorithms. Copic et al. (2009) propose using “pseudo-community structures”
which builds an artificial community around a randomly selected node to limit the set of optimal
partitions. They then use the established structure as a basis for maximum likelihood estimation
6The number of potential partitions of a set of of size n (the Bell number) is defined by the recursion formulaω j+1 = ∑
jk=0
( jk
)ωk
7Much of the work in machine learning and other fields assumes a Dirichlet Process (also known as a ChineseRestaurant Process) as the prior distribution for the partitions. This assumes that an individual is more likely to join alarger community than a smaller one and is dependent on an initial assumption for how many communities there are.I wish to avoid these assumptions because they do not necessarily fit all network structures.
6
of z, while optimization for pin and pout is done by grid search. Unfortunately, approximating
the network structure potentially excludes important information and relying on grid search for
estimation of the probabilities is inefficient. Because of this, this technique is only pursued for
networks of relatively small size (n < 50), limiting its usefulness for further experimentation.
A Bayesian implementation offers several advantages.8 While the maximum likelihood pro-
cedure is difficult to implement in practice because of the huge number of potential community
structures, a Metropolis-Hastings algorithm is capable of reaching all possible community struc-
tures with non-zero probability without requiring listing all possible partitions or making simpli-
fying assumptions. Additionally, rather than yielding point estimates, a Bayesian implementation
results in posterior distributions for the probabilities (which allows for easy hypothesis testing) and
a posterior distribution for community structure. A posterior distribution over community struc-
tures is simply a list of community structures along with their frequency over a given number of
iterations of the algorithm. This can be a “degenerate” distribution of only one community struc-
ture that fits the network better than any other, but this is not necessarily the case. Indeed it may
be very useful to know which partitions tightly “fit” the data and which are less certain.
3 Bayesian Implementation
Given the likelihood function defined in (3), the posterior distribution for the set of within commu-
nity probabilities pin and across community probabilities pout along with the optimal community
structure z given the observed network A is:
f (A,C|z∗,pin∗,pout∗) ∝ L(z,pin,pout |A,C) f (z,pin,pout) (5)
where f (z,pin,pout) represents the prior distribution of the probabilities of link formation within
a community and outside of a community, along with the prior beliefs about the probability of a
given partition of the network. Because the Beta distribution is the natural conjugate prior of the
8Copic et al. (2009) acknowledge the possibility of using Bayesian techniques in their work but choose not pursue itleaving this initial step to Chapman and Zhang (2010). To my knowledge, this is the first work to present a generalizedcommunity detection model combined with a sophisticated Bayesian convergence algorithm
7
binomial distribution, combining the likelihood conditional on the pout` with the prior distribution
pin` yields the kernel of Beta distribution
Beta
(α + ∑
i∈z(`)∑
j∈z(`)Ai j,β +
n
∑i∈z(`)
∑j∈z(`)
(Ci j−Ai j)
)(6)
and similarly the distribution for pout` has the following kernel for a truncated Beta distribution
BetaInc
(α + ∑
i∈z(`)∑
j 6∈z(`)Ai j,β +
n
∑i∈z(`)
∑j 6∈z(`)
(Ci j−Ai j)
). (7)
These kernels allow for estimation of a posterior distribution for the probabilities in pin and pout
via Gibbs sampling.
The space of all possible partitions is an extremely large discrete space with no convenient
functional form which precludes Gibbs-sampling to find a distribution over possible partitions.9
Consequently, estimation of an unknown posterior distribution proceeds via a Metropolis-Hastings
Markov-Chain Monte Carlo algorithm implemented within the Gibbs sampler. The intuition be-
hind how this works is straightforward. Candidate partitions (potential new community structures)
are generated by a stochastic process. Given two possible community structures, the original and
the new partition, the algorithm selects the one that is most likely given the previously sampled
values for pin and pout and discards the other. New values for pin and pout are sampled from beta
distributions conditional on the selected partition. This process continues for a given number of
repetitions until the process converges to posterior distributions for the probabilities and a posterior
distribution over community structures.
The Metropolis-Hastings algorithm requires positing a candidate-generating function, q(·, ·),
which determines the probability of observing a new partition z′ given the current partition z. These
functions can be quite flexible, but must allow for any move between partitions to be reversed with
non-zero probability. I utilize two such functions in my algorithm. The first function relies on
comparing two partitions, which are identical except for one node that is randomly assigned to
another community, which I will refer to as individual-walk. Chapman and Zhang (2010) posit the
9A variety of techniques in machine learning use a Chinese Restaurant Process prior which can yield results viaGibbs sampling given certain assumptions about the community detection process, see Mørup and Schmidt (2012) fora recent example in a long line of literature. For reasons previously discussed in Section 2 I am reluctant to assumethis prior.
8
following function for this purpose, q(z′|z) where q represents the probability of transitioning to a
partition z′ from the current partition z. Recall L as the number of communities in a given partition
and L` as the number of nodes in community ` for that partition. The transition function then has
the specific form
q(z′|z) =
1
(L)2×L`if L` > 1
1(L)2−L
if L` = 1(8)
This function represents the probability that a given node will change from one community to
another and assumes that prior beliefs indicate that all partitions are equally likely. Intuitively, the
probability that a given node will change from one community to another is determined by the
probability that a given node is selected within a randomly selected community, 1L×
1L`
, multiplied
by the inverse of the number of communities that it could potentially move to, 1L . The cases account
for the difference in calculations when a node belongs to a singleton community.
The second candidate-generating function is a variation of the split-merge algorithm proposed
by Jain and Neal (2004).10 As might be expected, moving only one node at a time to generate
candidate partitions results in very slow convergence. Since Chapman and Zhang (2010) only
implement their algorithm for a network of size 14, this is not very problematic; however, imple-
menting the procedure with networks that are only marginally larger results in prohibitively lengthy
computation time. The split-merge algorithm allows for larger changes in community partitions so
that convergence is achieved more quickly.
The split-merge algorithm begins by randomly selecting two nodes. If those nodes are in the
same community, the community is “split” into two new communities by assigning the second
of the two nodes to a new community, leaving the first node in the original community, and then
randomly allocating the remaining nodes in the original community between the original and new
communities via a series of Bernoulli trials. If the nodes are in the different communities, all the
nodes in both communities are “merged” so that all the nodes in the two communities are combined
10Jain and Neal (2004) propose the split-merge algorithm as a technique for Dirichlet processes where an infor-mative prior for community structures is chose. Since I have chose an uninformative prior for the partitions, theimplementation of the algorithm is different, but the idea follows directly from the original work.
9
into the community of the first randomly selected node. The associated probability of transitioning
from one partition to the next via split-merge for two randomly selected nodes i and j is given by:
q(z′|z) =
(
1n(n−1)
)(12
)L`
if zi = z j
1n(n−1)
if zi 6= z j.
(9)
This is simply the probability of randomly selecting two nodes from a set of size n multiplied by
the probability of assigning nodes between the node and original community. Note that in the
case of merging the communities the probability of assigning nodes is 1 because the communities
are merged with certainty.11 Although the split-merge proposals result in much larger changes in
the community structures that aids in rapid convergence, this same attribute makes it difficult to
transition between two partitions that are very similar. Consequently, I utilize both and individual-
walk and a split-merge transition in the Bayesian algorithm.
Candidate partitions are evaluated using the Metropolis-Hastings ratio. This represents the
probability of transitioning from the current partition z(m) to a candidate partition z′ and is given
by
min
[f (z′|pin, pout ,A,C))q(z′,z(m)))
f (z(m)|pin, pout ,A,C)q(z(m),z′),1
].
Note that the assumption of an uninformative prior over the partitions implies that the effect of the
prior in the ratio cancels out. This setup can be summarized in the following algorithm:
Algorithm 1 1. Assume initial values for pin0 , pout
0 , and an initial community partition z0
2. Sample new values pinnew and pout
new from their associated Beta distributions via Gibbs sam-
pling
3. Sample a new partition via a Metropolis-Hastings step:
(a) Randomly choose a new partition via either an individual-walk or a split-merge pro-
cess12
11Note that this proposal satisfies the reversibility condition because a community that has been merged returns tothe original community if the same two nodes are selected and split with an identical Bernoulli process (which occurswith non-zero probability). Additionally, a community that has been split returns to it’s original community if thesame two nodes are selected and merged.
12While there are several ways to implement this, the simplest is to alternate between the two methods. This causesno problems for the convergence of the Metropolis-Hasting Algorithm.
10
(b) Calculate the acceptance probability given the current values of pinnew and pout
new and the
current and previous partitions
(c) Accept the new community structure if the acceptance probability exceeds a uniformly
distributed random value; reject and retain the previous community structure otherwise
4. Repeat steps 2-3 for a given number of burn-in repetitions and a chosen number of trials
While initial values for pin0 , pout
0 are relatively innocuous because the Gibbs sampling quickly
converges to values within the desired range, the initial community structure z0 is very important
to the number of iterations required to achieve convergence. One choice that seems to work well in
initial testing is the “uninformative” partition of every node being in it’s own community; however,
the best choice may vary depending on the networks under consideration. It should be stressed
that the initial partition choice can be arbitrary and that the Metropolis-Hastings algorithm will
theoretically converge given any initial value; however, in practice it may be useful to choose an
informed prior if the researcher is comfortable doing so to limit the number of repetitions needed
to converge to the correct community structure.
4 Convergence Experiments
It is useful to examine the above algorithm on simulated networks where parameters are known.
Various statistical tests and graphical checks exist to test the convergence in distribution of the
probability parameters; however, the convergence of the community structure is much more dif-
ficult to test. Simulation allows verification that the algorithm not only converges properly to the
specified known probabilities, but also the correct known community structure.
Random network generation proceeds as follows. Set arbitrary parameter values for the proba-
bility of linking within a community and across communities p̃in and p̃out . Then generate a random
community structure zsim by specifying the number of nodes in a network n and assigning com-
munity membership. In this case, community membership is done by random assignment from a
11
chosen number of communities L.13 Using the resultant community structure, I populate an adja-
cency matrix A with ties between individuals based on an iterative process. Take an intermediate
adjacency matrix Ak and populate it with ones and zeros based on p̃in and p̃out as follows. If nodes
i and j are in the same community and if a draw from a uniform distribution is less than p̃in, let
Aki j = Ak
ji = 1 . Otherwise fill in 0. Similarly, if nodes i and j are not in the same community let
Aki j = Ak
ji = 1 if a draw from a uniform distribution is less than p̃out , otherwise fill in 0. Repeat
this process for a given value K. The resulting network adjacency matrix is A = ∑Kk=1 Ak and the
capacity matrix is a matrix with every element set to K, the number of potential interactions.
I generate 100 of these matrices and apply the algorithm to them and record the resulting
estimated probabilities and community structure. Additionally, I make the following assumptions
which are chosen for their similarity to the application in Section 5:
1. There are only two communities
2. N = 100, i.e. there are 100 nodes in the network
3. p̃in = 0.7 and p̃out = 0.3
4. There are K = 500 potential interactions in the network
The number of iterations in the Bayesian algorithm are varied, ranging from values of 1,000,
5,000, 10,000 and 25,000 (with an equal number of burn-in repetitions that are discarded prior to
analysis) to get intuition about how quickly the process converges. I use the following benchmarks
to evaluate convergence:
• Correct community: The algorithm identifies the correct community
• p̃in in HPD: p̃in is in the 95% highest probability density interval of its estimated posterior
distribution
• p̃out in HPD: p̃out is in the 95% highest probability density interval of its estimated posterior
distribution
• pin Geweke test: Test the null hypothesis that the distribution for pin converged at a 95%
13This implies that each community will be roughly the same size n× 1L . This assumption is not necessary, but fits
the the empirical application very well because Congress is roughly evenly split between community membership. Italso represents a scenario where algorithms that have a Chinese Restaurant Process prior have difficulty converging.
12
confidence level14
• pout Geweke test: Test the null hypothesis that the distribution for pin converged at a 95%
confidence level
• Optimal community frequency: Proportion of the draws that return the resultant optimal
community structure
• Unanimous community frequency: The resultant community is the only one sampled during
the iterative process
These numbers are averaged over all 100 random networks and reported in Table 1 for a variety
of repetition values for a network of size 100. The results indicate that the algorithm converges
quite quickly given the assumptions above. It identifies the correct community structure in all
of the sampled networks with as few as 500 iterations of the algorithm with the Geweke test
rejecting the null that the test has converged approximately 5% of the time for both pin and pout
(as would be expected for a test at the 95% confidence level repeated 100 times). Lastly, even
though the algorithm converges well at 500 iterations, it is important to remember that the random
network was generated in such a way that all the assumptions hold perfectly.15 This indicates that in
applied settings 500 is a minimum iteration requirement to get some intuition about the estimated
parameters; however, rigorous analysis should probably rely on many more. I have conducted
similar tests with a variety of assumptions about the network itself and this general pattern holds
regarding number of iterations. Not reported are simulations that indicate that holding the number
of repetitions fixed at 25,000 (with an equal number of burnins) seems to be sufficient to ensure
convergence in networks up to size 500. Naturally, these results likely hold for larger networks and
additional repetitions. In these cases, computing power becomes the binding constraint.
14The Geweke test is a simple convergence test for MCMC simulations. It tests the null hypothesis that there is nodifference in the mean between a subset of the repetitions collected at the beginning of the process and a subset of ofthe repetition collected at the end of the process. If you can reject this null, it is safe to conclude that the process hasnot converged. Note that this is not a true test of convergence in distribution as it only tests differences in means. Fordetails see Geweke et al. (1991).
15Of particular interest is the assumption that there are only two groups in the data. Preliminary results indicate thatthe process converges more slowly for networks with more than two groups.
13
5 Application to U.S. Senate
The United States Congress is easily interpreted as a network, a set of legislators connected in
ways that are easily observable such as voting patterns or co-sponsorship of bills. This simplicity
is not lost in the networks literature. From a technical standpoint, the Senate is often used as a test
for new computations techniques (Banerjee, El Ghaoui and D’Aspremont, 2008; Kolar, Le Song
and Xing, 2010). In an applied setting, researchers have used co-sponsorship on bills (Tam Cho
and Fowler, 2010; Harward, February 2010) and committee assignments (Porter, Mucha, Newman
and Warmbrand, 2005) to show that Congress exhibits traits commonly seen in social networks.
Specifically, a voting network consists of a Senators or Representatives with links between them
defined as the number of times they voted the same way on a bill. The element Ai j of the network
adjacency matrix A counts the number of times senator i voted yea when senator j voted yea and
the number of times they both voted nay. This structure yields a network that is weighted by the
number of common votes and undirected, because by construction Ai j = A ji. For this section I
focus on the Senate, although the results for the House are broadly similar.
While it seems clear anecdotally and from recent literature in political science (see for example
Poole and Rosenthal (2007); Theriault (2008)) that voting along partisan lines has increased in
the recent past, it is unclear that the increasing number of partisan votes actually means Congress
has become more polarized. Rather, it may be that consensus and conflict in Congress has stayed
constant over time due to the the rules and structure of Congress and only recently has party
voting clearly aligned with these preexisting voting patterns. The technique proposed in this paper
offers a unique setting with which to test this phenomenon. First, finding an optimal community
structure allows us to observe voting blocs in Congress, rather than relying on party identification
(an idea introduced by Waugh, Pei, Fowler, Mucha and Porter (2009)) as a relevant group for
decision making. I can then use Bayesian model comparison, to test whether or not the observed
voting bloc is distinct from party identification as a community partition. Second, the estimated
probabilities found via the Bayesian Community detection suggested above have very intuitive
interpretations in this setting, pinc is the probability of voting within a voting bloc and pout
c is the
14
probability of voting across a voting bloc. The expectation is that these probabilities should remain
fairly constant over time because they are determined by the structure, rather than the composition,
of Congress. However, the probabilities of voting in accordance with party identification, pinp and
poutp should vary as party identification comes closer to corresponding to observed voting blocs.
I use the Metropolis-Hastings within Gibbs algorithm proposed above to find the most likely
community structure and its associated probabilities pinc and pout
c . I use 10,000 burn-in repetitions
and 10,000 iterations which should be sufficient to ensure convergence given the simulations in
Section 4. Additionally, I run Geweke convergence tests for both probabilities and check the fre-
quency of appearance of the final community structure. These are reported in Table 2. It suggests
probability estimates for the Congresses converged, as the Geweke test rejects the null in approxi-
mately 5% of cases (as would be expected for a 95% test). However, it appears that there is more
uncertainty regarding the optimal partition than was evident in the convergence tests above, as only
60% of Congresses settled on only one potential partition.16
In addition to estimating partitions via the algorithm, I calculate the probabilities while holding
party identification constant as the community structure to obtain estimates of pinp and pout
p . The
estimated probabilities for 45th to 111th Senate, both for observed communities and party identi-
fication, are shown in Figure 1.17 The graphs exhibit the trends that are expected if Congress has
become more polarized, ppin and pp
out were less extreme than their voting bloc counterparts psin and
psout for most of the 20th Century but become indistinguishable in more recent Congresses.18 How-
ever, it is important to note that such convergence is not unprecedented, as Congresses in the late
1800s exhibit similar characteristics, although the measurements are more volatile from Congress
to Congress. This may be partially explained by smaller numbers of representatives taking smaller
16This is not evidence that the method is not working. The algorithm is supposed to converge to a distribution oflikely partitions, if there are several partitions that are nearly equally likely the algorithm will suggest these partitionsmore frequently. This is most likely due to the application not perfectly fitting the assumptions that were satisfied byconstruction in Section 4, for example the potential number of communities is not set to be two.
17The 45th Congress is considered by scholars as the beginning of the modern period of Congress, coinciding withthe end of Reconstruction. The 111th Congress is the most recent for which data is fully available.
18Formal statistical tests are possible using the highest probability density intervals for the estimated probabilities.However, the intervals are so narrow that statistically significant differences persist across almost all Congresses.These estimates are available upon request.
15
numbers of votes, resulting in additional sampling error. Figure 1 also reinforces the idea that pcin
and pcout remain relatively stable, with the exception of a seeming increase in ps
out from the 70th to
the 100th Congresses.
Figure 2 shows a simple measure of polarization,pin− pout , based on the estimated probabil-
ities for both party and voting bloc. This reinforces that the period from the 70th to the 100th
Congresses was somewhat unique, as it is clear that party based polarization is clearly lower than
the structural (voting bloc) persuasion. This graph also indicates that, in addition to party polar-
ization more closely aligning with the observed community polarization, structural polarization is
reaching unprecedented levels. Beyond the probability estimation, I am also able to use Bayesian
model comparison to test the null hypothesis that the party partition is the true community structure
against the alternative that estimated community structure is correct. I am able to strongly reject
the null in all but two Senates (the 59th and the 104th).
6 Extensions and Conclusion
This work contributes a Bayesian community detection algorithm and provides an application to
Congressional voting patterns. While this specific method is limited to applications where com-
munity structures and probabilities of link formation have an intuitive interpretation, the general
framework could be extended in several useful ways. Relaxing the assumption that probabilities
of link formation are identical across communities can provide additional insight into many net-
work questions. In the context of this work, a common explanation for increased polarization in
Congress is the Republican party moving further to the right. If this is the case, we would observe
pin growing and pout decreasing only for the voting bloc that most closely aligns with Republican
party. Methodologically, several extensions seem promising. First, the Bayesian framework estab-
lished in this paper is easily generalized to a hierarchical framework and estimate network level
parameters and individual parameters simultaneously. If it is possible to establish identification,
this process allows estimation of individual level models of network formation while formally ac-
16
counting for the endogenous nature of individual decisions on network structure. A second avenue
of future study would be to examine the properties of communities themselves. Communities are
likely to be determined both by observable attributes of the nodes and by unobservable attributes
of the nodes; therefore, the community to which a node belongs provides information about nodes
that is unobservable in other contexts.
17
ReferencesBanerjee, Onureena, Laurent El Ghaoui, and Alexandre D’Aspremont, “Model Selection
Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data.,”Journal of Machine Learning Research, 2008, 9 (3), 485 – 516.
Bech, Morten L., Carl T. Bergstrom, Rodney J. Garratt, and Martin Rosvall, “Mappingchange in the federal funds market,” Technical Report 2011.
Chapman, James and Yinan Zhang, “Estimating the Structure of the Payment Network in theLVTS: An Application of Estimating Communities in Network Data,” 2010.
Cho, Wendy K. Tam and James H. Fowler, “Legislative Success in a Small World: SocialNetwork Analysis and the Dynamics of Congressional Legislation,” The Journal of Politics,2010, 72 (01), 124–135.
Copic, Jernej, Matthew O. Jackson, and Alan Kirman, “Identifying Community Structuresfrom Network Data via Maximum Likelihood Methods,” The B.E. Journal of Theoretical Eco-nomics, 2009, 9.
Geweke, J. et al., Evaluating the accuracy of sampling-based approaches to the calculation ofposterior moments, Federal Reserve Bank of Minneapolis, Research Department, 1991.
Harward, Brian M., “The Calculus of Cosponsorship in the U.S. Senate,” Legislative StudiesQuarterly, February 2010, 35.
Jain, S. and R.M. Neal, “A split-merge Markov chain Monte Carlo procedure for the Dirichletprocess mixture model,” Journal of Computational and Graphical Statistics, 2004, 13 (1), 158–182.
Kolar, M, A. Le Song, and E. Xing, “Estimating time-varying networks,” Ann. Appl. Stat., 2010,4 (1), 94–123.
Leskovec, Jure, Kevin J. Lang, and Michael Mahoney, “Empirical comparison of algorithms fornetwork community detection,” in “Proceedings of the 19th international conference on Worldwide web” WWW ’10 ACM New York, NY, USA 2010, pp. 631–640.
Mørup, Morten and Mikkel N Schmidt, “Bayesian community detection.,” Neural computation,September 2012, 24 (9), 2434–56.
Newman, M.E.J., “Detecting community structure in networks,” The European Physical JournalB - Condensed Matter and Complex Systems, 2004, 38, 321–330. 10.1140/epjb/e2004-00124-y.
Poole, K.T. and H. Rosenthal, Ideology and Congress, Transaction Publishers, 2007.
Porter, Mason A., Peter J. Mucha, M. E. J. Newman, and Casey M. Warmbrand, “A net-work analysis of committees in the U.S. House of Representatives,” Proceedings of the NationalAcademy of Sciences of the United States of America, 2005, 102 (20), 7057–7062.
18
Santo and Fortunato, “Community detection in graphs,” Physics Reports, 2010, 486 (3-5), 75 –174.
Theriault, S.M., Party polarization in Congress, Cambridge University Press, 2008.
Waugh, Andrew S., Liuyi Pei, James H. Fowler, Peter J. Mucha, and Mason A. Porter, “PartyPolarization in Congress: A Network Science Approach,” Forthcoming, 2009.
Wetherilt, Anne, Peter Zimmerman, and Kimmo Soramaki, “The sterling unsecured loan mar-ket during 2006-08: insights from network theory,” Bank of England working papers 398, Bankof England July 2010.
19
A Tables and Figures
Table 1: Benchmark Convergence Tests: Varying Iterations
Number of Iterations100 500 1000 5000 10000 25000
Correct Community 0.46 1.00 1.00 1.00 1.00 1.00pin in HPD 0.98 0.96 0.92 0.91 0.95 0.97pout in HPD 0.98 0.96 0.92 0.91 0.95 0.97pin Geweke Test 0.76 0.04 0.02 0.05 0.04 0.09pout Geweke Test 1.00 0.05 0.02 0.11 0.02 0.04Optimal Community Frequency 0.08 1.00 1.00 1.00 1.00 1.00Unanimous Community Frequency 0.00 1.00 1.00 1.00 1.00 1.00
Table 2: Senate Convergence Tests
Proportion of 68 Senates
pin Geweke Test 0.05pout Geweke Test 0.07Optimal Community Frequency 0.66Unanimous Community Frequency 0.60
20
Figure 1: Probabilities of Link Formation
50 60 70 80 90 100 110
0.0
0.2
0.4
0.6
0.8
1.0
Probability of Voting Within Community and Outside Community
Congress
Prob
abili
ty
pinc
poutc
pinp
poutp
21
Figure 2: Senate Polarization
50 60 70 80 90 100 110
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Polarization in Congress over Time
Congress
p in
−p o
ut
pinc − pout
c
pinp − pout
p
top related