the essential role of empirical validation in legislative … · 2020-03-04 · kosuke imai...
TRANSCRIPT
The Essential Role of Empirical Validation inLegislative Redistricting Simulation
Kosuke Imai
Harvard University
Quantitative Redistricting and Gerrymandering ConferenceDuke UniversityMarch 3, 2020
Joint work with Ben Fifield, Jun Kawahara (Kyoto) and Chris Kenny
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 1 / 21
Motivation
Congressional redistricting as a key element of American democracyInfluenced by political motives and partisan endsEarly proposals in 1960s: automated simulation as a transparent,objective, and unbiased method for redistricting
Resurgence of simulation methods over the last 20 yearsincreasing availability of granular data about votersrecent advances in computing capability and methods
Starting to be used in courts (e.g., MO, NC, and OH)
Do simulation methods can actually yield a representative sample ofall possible redistricting plans that satisfy required constraints?
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 2 / 21
Overview
Insufficient efforts have been made to empirically validate redistrictingsimulation methodsEric Lander in his amicus brief to the Supreme Court:With modern computer technology, it is now straightforward to gener-ate a large collection of redistricting plans that are representative of allpossible plans that meet the State’s declared goals (e.g., compactnessand contiguity)
Some used 25 precinct validation set of Fifield et al. (forthcoming)
We apply the computational method of Kawahara et al. (2017)1 efficiently enumerate all possible redistricting plans2 independently and uniformly sample from this population
Scales to a state with a couple of hundred geographical units1 enumeration: a large number of small validation sets2 sampling: a small number of medium-size validation sets
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 3 / 21
Redistricting as a Graph-partitioning Problem
v1
v2
v3
v4
v5
v6
(a) Original map
v1
v2 v4
v6
v5v3
e1
e2
e3
e4
e5
e6
e7
(b) Graph representation
v1
v3
v2 v4
v6
v5
e2
e4 e6
e7
(c) Induced subgraphs
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 4 / 21
Zero-suppressed Binary Decision Diagram (ZDD)
A data structure to efficiently represent a family of sets (Minato, 1993)
ZDD as a directed acyclic graphroot node e1: no incoming arc, represents an edge of the original graphterminal nodes: no outgoing arc, no correspondence to an edge of theoriginal graph
1 0-terminal 02 1-terminal 1
every non-terminal node has two outgoing arcs1 0-arc: dashed arc 99K removes edge2 1-arc: solid arc −→ retains edge
One-to-one correspondence:Graph partition: {e2, e4, e6, e7}The set of edges that belong to a directed path from the root node to1-terminal node and have an outgoing 1-arc
e1 99K e2 −→ e3 99K e4 −→ e5 99K e6 −→ e7 −→ 1
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 5 / 21
e1
e2 e2
e3 e3 e3
e4 e4 e4 e4
e5 e5 e5 e5 e5 e5
e6 e6 e6 e6 e6 e6
e7 e7 e7 e7 e7 e7
0 1
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 6 / 21
Construction of ZDD
Starting with root node e1, create one outgoing 0-arc and oneoutgoing 1-arc from one node e` to the next node e`+1
Store the number of determined connected components or dcc foreach node: dcc = 1 for e5
e1 −→ e2 99K e3 99K e4 99K e5
{v1, v2} forms a district regardless of whether or not e5 is retainedCreate an arc into the 0-terminal node if dcc > p at any node ordcc < p at the final nodeKeep track of connected component number for each vertex of theoriginal graph
1 Start by setting comp[vi ]← i for i = 1, 2, . . . , n2 If we retain an edge between vi and vi ′ , then set
comp[vj ]← min{comp[vi ], comp[vi ′ ]} for any vj withcomp[vj ] = max{comp[vi ], comp[vi ′ ]}
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 7 / 21
The Frontier-based Search
Frontier: the set of vertices of the original graph that are incident toboth a processed edge and an unprocessed edgeF0 = Fm = ∅ where m is the total number of edgesFrontier can be used to determine a connected component number
suppose there exists a vertex v such that v ∈ F`−1 but v /∈ F`
If there is no other vertex in F` shares the connected componentnumber, then comp[v ] is determined and dcc is incremented by 1
1
2
3
4
5
6
e1
e2
e3
e4
e5
e6
e7
F1
(a) Process edge e1; dcc = 0
1
2
1
4
5
6
F2
e1
e2
e3
e4
e5
e6
e7
(b) Process edge e2; dcc = 0
1
2
1
4
5
6
F3
e1
e2
e3
e4
e5
e6
e7
(c) Process edge e3; dcc = 0
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 8 / 21
1
2
1
2
5
6
F4
e1
e2
e3
e4
e5
e6
e7
(d) Process edge e4; dcc = 0
1
2
1
2
5
6
F5
e1
e2
e3
e4
e5
e6
e7
(e) Process edge e5; dcc = 1
1
2
1
2
5
2F6
e1
e2
e3
e4
e5
e6
e7
F4
F4
(f) Process edge e6; dcc = 1
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 9 / 21
Node Merge for Computational Efficiency
Only required information = connectivity of vertices in F`−1
Merge multiple paths at node e` if dcc and the frontier F`−1 areidentical after renumbering to eliminate gaps
2
2
3
4
5
6
F2
e1
e2
e3
e4
e5
e6
e7
(a) Path e1 −→ e2 99K e3
3
2
3
4
5
6
F2
e1
e2
e3
e4
e5
e6
e7
(b) Path e1 99K e2 −→ e3
Merging is critical: 8× 8 lattice into 2 districts ≈ 1.2× 1011 partitionsCan encode the population parity and other information into ZDD prevents merging and hence does not scale
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 10 / 21
Enumeration and Independent Sampling
Every path from the root node to the 1-terminal node has one-to-onecorrespondence to a graph partitionEnumerate all the paths
1 start with the 1-terminal node2 count the number of unique paths at each node3 move upwards until the root node is reached
Independent sampling (Knuth 2011)Let c(e`) be the number of paths from the 1-terminal node to node e`Let e`0 and e`1 be the nodes pointed by the 0-arc and 1-arc of e`Store c(e`) for each node e`Conduct random sampling by starting with the root node and choosingnode e`1 with probability c(e`1)/{c(e`1) + c(e`0)}Probability of reaching the 0-terminal node is zero
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 11 / 21
Scalability of the ZDD Construction Algorithm
Randomly generate contiguous subsets of the New Hampshire map(327 precincts and 2 districts) that vary in size {40, 80, . . . , 200}Number of districts: 2, 5, or 10Cluster with 530 nodes, 48 cores and 180 GB of RAM per node
● ●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●
●
● ●
●
●●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●●
●
2 D
istr
icts
5 D
istr
icts
10 D
istr
icts
40 80 120 160 200
0
5
10
15
0
5
10
15
0
5
10
15
Number of Precincts in Map
Run
time
(hou
rs)
●
Finished Building ZDDRan Out of Memory(180GB Limit)
Runtime Scalability
● ●● ●● ●● ● ●● ●●● ●●
●●
●● ●● ●●● ● ●●●●
● ●● ●●● ●● ●Memory Limit
● ●● ●● ● ●●●
● ●● ● ●●
● ●●●
● ● ●●● ●
● ●●● ●● ●● ●●● ●● ●● ● ●●● ●● ●Memory Limit
●●● ●● ●● ●●●
●●●
●●●● ●
●● ●●
●● ●● ●● ● ●● ●●●●● ●●
●● ●● ● ●●●
●●●
● ●Memory Limit
40 80 120 160 200
0
50
100
150
0
50
100
150
0
50
100
150
Number of Precincts in Map
Mem
ory
Usa
ge (
GB
)
Memory Usage Scalability
● ●●●● ●● ●●● ● ●● ●●
●●
●● ●● ●●● ●● ●●●
● ●● ●●● ●●●Memory Limit
● ●● ●● ●● ●●
● ●●● ●●
● ●●●
● ● ● ●●●
● ● ●● ● ●●●●●● ●●●● ● ●●● ●●●Memory Limit
●●● ●● ●● ● ●●
●●●
●● ●●●
●●● ●
● ● ●● ●● ●●● ●●●
●● ●●●
●●● ● ● ●●● ●●
● ●Memory Limit
5 10 15 20
0
50
100
150
0
50
100
150
0
50
100
150
Max Frontier Size of Graph
Mem
ory
Usa
ge (
GB
)
Memory Usage Scales with Frontier Size
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 12 / 21
Validation through Enumeration
70 precinct validation set from Florida (� 25 validation set )8 hours on MacBook Pro laptop with 16GB and 2.8 GHz processorBuilding ZDD took less than a second: 44 million valid plans
25.750
25.775
25.800
25.825
25.850
25.875
−80.20 −80.16 −80.12
Longitude
Latit
ude
Population
0
5000
10000
70−Precinct Validation Map
< 0.01 Parity: 717060
< 0.05 Parity: 3678453< 0.10 Parity: 7866708
0
300,000
600,000
900,000
0.00 0.05 0.10 0.15 0.20
Distance from Population Parity
Num
ber
of P
lans
Distribution of Population Parity on 70−Precinct Map
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 13 / 21
Performance of RSG and MCMC Algorithms
Evaluate the performance of two common algorithms:1 Random-seed-and-grow (RSG): Cirincione et al. (2000); Chen and Rodden
(2013)2 Markov chain Monte Carlo (MCMC) Fifield et al. (2014); Mattingly and
Vaughn (2014)
Implemented via the redist packagetempering/discarding/reweighting for population constraintsRepublican dissimilarity index as a test statistic
No Equal Population Constraint 5% Equal Population Constraint 1% Equal Population Constraint
0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3 0.0 0.1 0.2 0.3
0
5
10
15
Republican Dissimilarity Index
Den
sity
Truth
MCMC
RSG
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 14 / 21
Many Small Validation Maps
Robustness to many different mapsNo separate tuning or convergence diagnostics for each mapSetup
1 200 independent 25-precinct sets from Florida2 5 million iterations, taking every 500th draw3 conduct the Kolmogorov-Smirnov test and record the p-value
●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●
●●●●●●●●●●●●
●●●●●●●●●
●●●●●●●●
●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●
●●●●●●●●
●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●●
●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●●●●●●
●●●●●●●
●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●
●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
No Equal Population Constraint 20% Equal Population Constraint 10% Equal Population Constraint
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
0.00
0.25
0.50
0.75
1.00
Theoretical Quantile
Em
piric
al Q
uant
ile
● MCMC
RSG
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 15 / 21
Validation through Independent Uniform Sampling
Even if we can build ZDD, enumeration is computationally intensiveRandom sampling addresses this issue via Monte Carlo approximationIowa map: 4 districts
99 counties; no county is supposed to be split500 million independent drawsactual map has a population parity of less than 0.0001
●
●
Des Moines
Cedar Rapids
Congressional Districts of Iowa (2016)
District 1District 2District 3District 4
100 200 300 400Population (Thousands)
< 0.01 Parity: 0.00006%< 0.05 Parity: 0.00723%
< 0.10 Parity: 0.05746%
0.00%
0.02%
0.04%
0.06%
0.00 0.05 0.10 0.15Distance from Population Parity
Sha
re o
f Sam
pled
Pla
ns
Distribution of Population Parity on Iowa Map
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 16 / 21
Performance for the Iowa Validation Map
MCMC:8 chains with 250K iterations eachGelman-Rubin diagnostics indicates convergence after 30K iterations5% parity: 630K maps1% parity: 93K maps
RSG: 2 million independent draws
No Equal Population Constraint 5% Equal Population Constraint 1% Equal Population Constraint
0.00 0.03 0.06 0.09 0.00 0.03 0.06 0.09 0.00 0.03 0.06 0.09
0
10
20
30
40
50
Republican Dissimilarity Index
Den
sity
Truth
MCMC
RSG
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 17 / 21
A New 250-Precinct Validation Map
28.00
28.25
28.50
28.75
29.00
29.25
−81.25 −81.00 −80.75 −80.50Longitude
Latit
ude
Population
0
5000
10000
15000
250−Precinct Validation Map
< 0.01 Parity: 1.95%
< 0.05 Parity: 10.08%< 0.10 Parity: 21.78%
0%
1%
2%
3%
0.00 0.05 0.10 0.15Distance from Population Parity
Sha
re o
f Sam
pled
Pla
ns
Distribution of Population Parity on 250−Precinct Map
The largest validation map taken from Florida2 districts total number of contiguous plans = 539
We uniformly sample 100 million partitions
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 18 / 21
Empirical Performance
MCMC:8 chains with 500K iterations eachGelman-Rubin diagnostics indicates convergence after 75K iterations5% parity: 3 million maps1% parity: 1.9 million maps
RSG: 4 million independent drawsNo Equal Population Constraint 5% Equal Population Constraint 1% Equal Population Constraint
0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25
0
20
40
60
Republican Dissimilarity Index
Den
sity
Truth
MCMC
RSG
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 19 / 21
Concluding Remarks
Increasing use of computational methods to generate redistrictingplans in legislatures and determine their legality in courtsScientific community must empirically validate the performance ofvarious proposed simulation methods
We apply the recently developed enumeration method1 more realistic validation maps including Iowa and a 250-precinct map2 an MCMC algorithm significantly outperforms a RSG algorithm3 the algorithm and validation maps will be made available
Ongoing work:1 consequences of various constraints other than contiguity and
population parity2 further scaling up the enumeration algorithm
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 20 / 21
References
Fifield, Imai, Kawahara, and Kenny. (2020). “The Essential Role ofEmpirical Validation in Legislative Redistricting Simulation.” WorkingPaper
Fifield, Higgins, Imai, and Tarr. “Automated Redistricting SimulationUsing Markov Chain Monte Carlo.” Journal of Computational andGraphical Statistics, Forthcoming.
redist: Markov Chain Monte Carlo Methods for RedistrictingSimulation. available at CRAN
https://imai.fas.harvard.edu
Kosuke Imai (Harvard) Validation of Redistricting Simulation March 3, 2020 21 / 21