Which Police Departments Want Reform?Barriers to Evidence-Based PolicymakingSamantha Goerger1, Jonathan Mummolo2, and Sean J. Westwood3
1Princeton University2Princeton University3Dartmouth College
April 3, 2020
Abstract
Political elites increasingly express interest in evidence-based policymaking, buttransparent research collaborations necessary to generate relevant evidence posepolitical risks, including the discovery of sub-par performance and misconduct.If aversion to collaboration is non-random, collaborations may produce evidencethat fails to generalize. We assess selection into research collaborations in thecritical policy arena of policing by sending sincere requests to discuss researchpartnerships to roughly 3,000 law enforcement agencies in 48 states. A host ofagency and jurisdiction attributes fail to predict affirmative responses to genericrequests, alleviating concerns over generalizability. However, across two experi-ments, mentions of agency performance in our correspondence depressed affirma-tive responses—even among top-performing agencies—by roughly eight percent-age points. Many agencies that indicate interest in transparent, evidence-basedpolicymaking are likely engaging in cheap talk, and recoil once performance eval-uations are made salient. These dynamics can inhibit valuable policy experimen-tation in many communities.
Word count: 2,2281
1This study was approved by Institutional Review Boards at Princeton University and Dartmouth Col-
lege. A pre-registration plan for the national experiment appears in the Online Appendix. We thank Tori
Gorton, Alexandra Koskosidis, Destiny Eisenhour, Krystal Delnoce, Grace Masback and Madeleine Marr for
research assistance.
The increasing availability of high-resolution data on human behavior and the develop-
ment of field experimental methods in social science have made research collaborations with
practitioners the gold standard in policy research (Cartwright and Hardie, 2012). These
partnerships—which span substantive arenas including poverty reduction (Alatas et al.,
2012), political advertising (Gerber et al., 2011), and health care (Litvack and Bodart,
1993)—offer numerous advantages, simultaneously leveraging access to otherwise restricted
data, real-world settings, and rigorous experimental designs (Gerber and Green, 2012).
But like any approach to research, partnerships with practitioners have drawbacks. Chief
among them is the fact that the very organizations being studied decide whether research
can proceed, and there are strong reasons to suspect this decision is associated with out-
comes scholars wish to understand, like agency performance. Put differently, while many
political elites have recently instituted calls for “evidence-based policymaking” (Orszag and
Nussle, 2017), such declarations may be cheap talk. The political risks associated with al-
lowing outside experts to scrutinize organizational practices—e.g. the discovery of sup-par
performance, or even misconduct—are substantial, especially for poorly functioning organi-
zations (Carpenter, 2014; Moffitt, 2010). And if poorly-performing agencies are differentially
likely to decline research partnerships, the body of evidence produced by one-off research
collaborations could fail to generalize to organizations at large (Allcott, 2017).
In this study, we assess the determinants and generalizability of research collaborations
in the important policy domain of policing. A long history of allegations of racial bias
(Alexander, 2010), a recent string of high-profile police-involved killings (Edwards, Lee and
Esposito, 2019), and growing concern over the use of excessive force and militarized policing
(Gunderson et al., 2019; Knox, Lowe and Mummolo, Forthcoming) have spurred numerous
collaborations between academics and law enforcement agencies to detect inequity in police
procedures (e.g. Goff et al., 2016) and test the efficacy of proposed reforms (e.g. Yokum,
Ravishankar and Coppock, 2019). But the highly politicized nature of policing suggests
many agencies will be reluctant to partner with researchers, and that the ones who do are
1
unrepresentative of the more than 18,000 law enforcement agencies in the United States.
To evaluate the severity and nature of selection, we conducted two field experiments
in which we sent sincere2 offers to roughly 3,000 local police and sheriff’s departments to
discuss a potential research collaboration with scholars at two East Coast universities, and
analyzed variation in responses. This design allowed us to assess both the correlates and
causes of willingness to collaborate. Merging data on responses with records of jurisdiction
demographics, local partisanship, department personnel, and agency performance, we first
show that agencies open to discussing research collaborations are largely similar to those
that declined our invitations. This finding bolsters the validity of the collaborative research
approach, and suggests findings emanating from one-off research partnerships are plausi-
bly contributing to a generalizable body of knowledge. However, across two experiments,
including a pre-registered nationwide replication, a randomized mention of agency perfor-
mance in our communications depressed affirmative responses by roughly eight percentage
points. These negative effects hold even for top-performing agencies.
Agencies that show openness to research partnerships look broadly similar to those who
will not consider them, but the willingness to partner with academics for policy research is
not as widespread as it appears. Once discussions move from the general to the specific, and
raise the prospect of performance evaluations critical to the field testing of any new policy,
many agencies recoil. This dynamic reveals a general barrier to research collaborations that
can preclude valuable policy experimentation in many communities.
Experimental Design
We began with a study in New Jersey involving 462 agencies, paired with newly released
detailed data on the use of force (nj.com, 2019). During April and May of 2019, we contacted
police chiefs offering to collaborate on research that “aims to make both citizens and officers
safer by reducing the frequency of violence during police-citizen interactions” (see Online
2This outreach yielded over 300 affirmative responses, and we are currently following up with these
agencies to develop research partnerships.
2
Appendix section B2 for full text). We relied on a custom Python script to prepare and send
our messages from a dedicated institutional email. These messages contained no deception;
offers to discuss collaborations were sincere. We offered to work pro-bono and cover all
research costs, and added “We are not asking for a firm commitment now” but are simply
asking whether the recipient is “interested in discussing a potential collaboration further.”
Respondents could answer (via links in email and a URL provided in print letters) yes, no,
or “I am not sure, but I would like more information.” Our primary outcome is a binary
indicator of answering “yes,” with all other responses and non-responses coded as negative
responses. If we received no response after three email attempts—spaced eight days apart—
we sent a posted letter one week after the final email.
Agencies in the N.J. study were randomly assigned to one of four conditions.3 All agen-
cies received the information above, which served as the full text for those in the control
condition. Three treatment conditions included language aimed at testing how common fea-
tures of research collaboration requests affect agency responses. One treatment cell included
a promise of confidentiality in any publication that resulted from a research partnership,
which is a common practice in such settings and which we hypothesized would increase af-
firmative responses. A second “ranking” condition included mention of agency performance:
the agency’s rank on uses of force per officer among contacted agencies. A third condi-
tion combined both the confidentiality and ranking treatments (with the order of the two
treatments randomized within the text of the email across recipients).
Following the N.J. study we deployed a second pre-registered experiment in 47 additional
states during September and October of 2019, in which we attempted to contact approxi-
mately 2,500 local police and sheriff’s departments, a sample size we chose based on a power
analysis in order to detect a possible interaction between the performance treatment and
agency rank. We randomly drew our sample from a population of roughly 7,700 agencies
that consistently report crime data to the FBI and for whom we could ascertain reliable
3See Online Appendix Section D4 for balance tests.
3
contact information (these criteria excluded of Alaska and Illinois; see Appendix section A1
for sampling details). Roughly 60% of the U.S. population reside in these agencies’ jurisdic-
tions, according to FBI data. While we would ideally wish to sample from the entire U.S.,
the population of agencies that remain after applying these filtering criteria are those with
whom a productive research collaboration might plausibly occur. It would be difficult to
form collaborations with agencies that do not regularly report basic crime data, or publicize
reliable contact information. Our sample is therefore a relevant one for applied researchers.
The design of this experiment was highly similar to the N.J. study with some exceptions.
Two changes were aimed at maximizing statistical power. First, we retained only the ranking
treatment and control conditions. Second, we employed a matched pair design (Gerber and
Green, 2012), in which agencies in the same state serving roughly the same population
size were paired, and one agency was randomly assigned to treatment. Specifically, treated
agencies were told how they ranked among the roughly 2,500 agencies sampled on the share
of violent crimes “cleared” between 2013-2017 (crimes where an arrest was made and charge
was filed)—a salient statistic for police agencies, and one on which journalists often focus
(e.g. Madhani, 2018). The use of two different performance metrics across these experiments
helps to ensure the robustness of any observed treatment effects.
We hypothesized that mentions of agency performance would filter out “cheap talk”
and depress affirmative responses generally, since making performance evaluations salient
could cause agencies to consider the political risks associated with research partnerships.
However, we anticipated that this negative effect would attenuate with agency rank, since
agencies informed they were performing well relative to peers may be less likely to recoil at
the spectre of performance evaluations. Following both experiments, all contacted agencies
were sent a debrief message informing them of the purpose of the experiment, and reinforcing
that our messages contained no deception.4
4Communication between contacted agencies could contaminate our results. This is less likely in the
national experiment given its geographic spread. To the extent cross-talk occurred, it likely homogenized
4
Figure 1: Coverage of Field Experiments. Number of contacted agencies in each U.S.state. Over 400 agencies were contacted in the N.J. study.
0
100
200
300
400
Number of Sampled Agencies
Combined, contacted agencies serve jurisdictions that are home to close to 80 million
people according to FBI data (see Figure 1), approximately one quarter of the U.S. popu-
lation. These include large metropolitan police forces, mid-sized agencies and small rural
departments employing just a handful of officers.5
Collaborating Agencies Are Representative
To test whether willingness to collaborate systematically varies with agency attributes,
we merged agency-level data on: crime, fatal officer-involved shootings between 2015 and
2018, personnel, and jurisdiction demographics.6 In total, 319 agencies indicated willingness
to discuss a potential research collaboration—approximately 11% of our combined sample
of 2,944 agencies across the two experiments—238 agencies responded negatively to our
message, and 2,387 agencies did not reply at all.
We estimated separate bivariate linear regressions predicting affirmative response as a
function of each covariate, weighted by jurisdiction population. We correct resulting p-
values on all regression coefficients using the Benjamini-Hochberg method (Benjamini and
Hochberg, 1995), though this adjustment makes little difference to our overall conclusions.
To avoid conflating the predictive value of a regressor with the effect of our randomized
interventions, we confine this analysis to the roughly 1,400 observations assigned to control,
responses across treatment conditions and attenuated effects.
5Following our IRB protocol, we map agencies at the state-level to maintain anonymity.
6See Online Appendix Section C3 for details on data sources.
5
Figure 2: Agency and jurisdiction attributes do not predict affirmative responses.
● ●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ●
−10
−5
0
5
10
Trum
p vo
te s
hare
(tw
o pa
rty)
Offi
cers
ass
aulte
d pe
r ca
pita
% >
25
with
hig
h sc
hool
+%
Whi
tes
in p
over
tyM
ale
offic
ers
per
capi
ta%
"O
ther
" ra
ce n
on−
His
pani
c in
pov
erty
% M
ultir
acia
l in
pove
rty
% H
ispa
nics
in p
over
ty%
Nat
ive
Haw
aiia
n an
d P
acifi
c Is
land
er in
pov
erty
Offi
cers
kill
ed b
y fe
lon
per
capi
ta%
Whi
te%
His
pani
cA
ttem
pted
rap
es p
er c
apita
% V
acan
t hou
sing
uni
ts%
Asi
an a
bove
pov
erty
line
Med
ian
age
Man
slau
ghte
rs p
er c
apita
Bur
glar
ies
per
capi
taM
edia
n ag
e of
mal
e po
pula
tion
% "
Oth
er"
race
non
−H
ispa
nic
Mea
n cl
eara
nce
for
viol
ent c
rime
per
capi
taO
ffice
rs k
illed
in a
n ac
cide
nt p
er c
apita
% "
Oth
er"
race
une
mpl
oyed
Rob
berie
s pe
r ca
pita
Pop
ulat
ion
Ass
aults
per
cap
ita%
Mal
eA
ll cr
imes
per
cap
ita%
Nat
ive
Am
eric
an a
nd A
lask
an n
ativ
e R
apes
per
cap
itaV
ehic
le th
efts
per
cap
itaR
ape
by fo
rce
per
capi
taM
ean
clea
ranc
e fo
r pr
oper
ty c
rimes
per
cap
ita%
Nat
ive
Haw
aiia
n an
d P
acifi
c Is
land
er%
Afr
ican
Am
eric
ans
in p
over
tyM
edia
n ho
useh
old
inco
me
% A
fric
an A
mer
ican
Mal
e ci
vilia
n em
ploy
ees
per
capi
ta%
Mul
tirac
ial n
on−
His
pani
c%
Nat
ive
Am
eric
an a
nd A
lask
an n
ativ
e in
pov
erty
Mur
ders
per
cap
itaM
ean
clea
ranc
e fo
r m
urde
rs p
er c
apita
% >
25 w
ith b
ache
lors
deg
ree
% A
sian
Per
cent
age−
poin
t cha
nge
in a
gree
men
t fro
m a
one
sta
ndar
d de
viat
ion
incr
ease
in th
e pr
edic
tor
Benjamini−Hochberg corrected p−value ● P>.05
of which 201 agencies responded affirmatively.7
Figure 2 displays the predicted change in the probability of an affirmative response given
a one-standard-deviation increase in each predictor. Across roughly 50 tests, no covariate
was significantly associated with responses. Overall, our analysis indicates selection bias—at
least at the initial point of contact from researchers—poses a minimal risk in this setting.
Some may question whether we are missing meaningful associations in this analysis due
to a lack of statistical power. But while additional data may allow us to detect correlations,
other features of the results belie the existence of meaningful relationships. For one, several
features related to agency performance generate opposing signs, e.g. assaults on officers
and a host of crime measures including murders and rapes per capita. Second, only one
result, officers assaulted per capita, is statistically significant when we opt not to correct for
multiple testing. Third, we find no significant associations even if we include data from all
experimental conditions to maximize sample size (see Figure E2 in Appendix). The largest
7We exclude N.J. State Police in this analysis due to difficulty in accurately pairing with U.S. Census
data.
6
estimated coefficient relates to Trump’s share of the two-party vote in an agency’s county,
suggesting agencies in conservative areas may be less likely to collaborate. However, the
overall pattern of results does not indicate selection related to agency performance, meaning
agencies arguably in most need of reform are not systematically declining to collaborate.
Performance Evaluations Inhibit Collaborations
We now turn to assessing the impact of our experimental interventions. Figure 3 dis-
plays the average effect of each treatment relative to the control condition estimated via
linear regression. Because we cannot guarantee all messages were reviewed, these represent
Intention-to-Treat effects (ITTs), understating the effect of universally received similar mes-
sages.8 In the national experiment, our models include indicators for all matched pairs, with
standard errors clustered by matched pair.
Turning first to the N.J. experiment, we find randomized offers to keep the identity of
collaborating agencies confidential, including one version where a performance cue was also
supplied, had no detectable effect on response rates (β = −0.02, se = 0.04, p = 0.69 and
β = −0.04, se = 0.04, p > 0.40 respectively). This was surprising, as such confidentiality
offers are often made to convey a sense of security and thereby increase the likelihood of
collaborations. However, because such offers still rely on academic collaborators to keep
their word and effectively safeguard agency identities, this promise may ring hollow, and
additional assurances may be required before agencies will consider collaborations. However,
recipients told their statewide rank on mean uses of force per officer (“Ranking Condition”)
were roughly 9 percentage points (se = 0.004, p = 0.02) less likely to respond affirmatively
than agencies assigned to control where about 15% of agencies agreed. Strikingly, this effect
was precisely replicated in the nationwide experiment: agencies told their rank on violent
crime clearance rates were about 8 percentage points (se = 0.02, p < 0.01) less likely to say
they would discuss a potential collaboration.
8Emails and post letters sent to three agencies were returned to us due to invalid addresses. We drop
these and their corresponding matched pair from all analyses since the ITT interpretation is invalid.
7
Figure 3: Mentioning agency performance lowers affirmative responses
●
Confidential Condition
Ranking and Anonymous Condition
Ranking Condition
−20 −10 0 10
Difference in agreement from control(percentage points)
Study
● National
New Jersey
Contrary to our expectations, additional tests interacting treatment assignment with
agency rank show that these negative effects persist even among top performing agencies (see
Online Appendix Figure G1). This result highlights the strong aversion of police agencies
to outside evaluation, and suggests a general and powerful impediment to the formation of
research partnerships. While many agencies indicate openness to collaboration, a large share
recoil once the topic of agency performance is inevitably broached. This may be because
agencies that performed well on a given metric in the past have no guarantee of positive
results in the future, especially once outside scrutiny is allowed.
Discussion and Conclusion
While they offer numerous advantages over other methods of inquiry, research collabora-
tions with outside experts also pose political risks that may preclude partnerships in ways
that threaten the generalizability of results. Despite a string of recent promising collabora-
tions with individual agencies, researchers have understandably raised concerns over external
validity. If agencies willing to collaborate with academics are unrepresentative of agencies at
large, then collaborative field experiments, however carefully executed, may have little value
outside the agencies in which they are conducted.
In this paper, we evaluated the nature and severity of selection into research collabora-
tions with police agencies via two field experiments. Our results, precisely replicated across
studies, offer several useful insights for applied researchers. First, we find agencies which de-
cline to discuss research collaborations are largely similar to those that respond affirmatively
across a range of agency and jurisdiction attributes. However, our experimental results im-
ply that many agencies who profess an openness to evidence-based policymaking are likely
8
engaging in cheap talk, as a mere mention of agency performance substantially depresses
affirmative responses. Our analysis is confined to the initial stage of contacting agencies to
develop research partnerships. As this process unfolds and the possibility of negative pub-
licity that sometimes results from transparent research is made more apparent, it is possible
that even more agencies would be unwilling to collaborate on evidence-based policy research.
Increasing openness to evidence-based policymaking offers a valuable opportunity to gen-
erate effective reforms in a range of social institutions and fortunately, concerns over external
validity appear to be overstated, at least in one important policy domain. But the political
risks associated with performance evaluations impose a significant constraint, underscoring
the inescapable politics of public policy.
References
Alatas, Vivi, Abhijit Banerjee, Rema Hanna, Benjamin A. Olken and Julia Tobias. 2012.
“Targeting the poor: evidence from a field experiment in Indonesia.” American Economic
Review 102(4):1206–40.
Alexander, Michelle. 2010. The New Jim Crow: Mass Incarceration in the Age of Color-
blindness. The New Press.
Allcott, Hunt. 2017. “Site selection bias in program evaluation.” The Quarterly Journal of
Economics 130(3):1117–1165.
Benjamini, Yoav and Yosef Hochberg. 1995. “Controlling the false discovery rate: a practical
and powerful approach to multiple testing.” Journal of the Royal Statistical Society: Series
B (Methodological) 57(1):289–300.
Carpenter, Daniel. 2014. Reputation and power: organizational image and pharmaceutical
regulation at the FDA. Princeton University Press.
Cartwright, Nancy and Jeremy Hardie. 2012. Evidence-based policy: A practical guide to
doing it better. Oxford University Press.
Edwards, Frank, Hedwig Lee and Michael Esposito. 2019. “Risk of being killed by police use
of force in the United States by age, race–ethnicity, and sex.” Proceedings of the National
9
Academy of Sciences .
Gerber, Alan S. and Donald P. Green. 2012. Field experiments: Design, analysis, and
interpretation. WW Norton.
Gerber, Alan S., James G. Gimpel, Donald P. Green and Daron R. Shaw. 2011. “How
large and long-lasting are the persuasive effects of televised campaign ads? Results from
a randomized field experiment.” American Political Science Review 105(1):135–150.
Goff, Phillip Atiba, Dean Obermark, Nancy La Vigne, , Jennifer Yahner and Amanda Geller.
2016. The Science of Policing Equity: Measuring Fairness in the Austin Police Department.
Technical report Center for Policing Equity & John Jay College of Criminal Justice.
Gunderson, A., E. Cohen, K. Jackson, T.S. Clark, A. Glynn and M.L. Owens. 2019. “Does
Military Aid to Police Decrease Crime? Counterevidence from the Federal 1033 Program
and Local Police Jurisdictions in the United States.” Working Paper . https://tinyurl.
com/y66j6s8g.
Knox, Dean, Will Lowe and Jonathan Mummolo. Forthcoming. “Administrative Records
Mask Racially Biased Policing.” American Political Science Review .
Litvack, Jennie I. and Claude Bodart. 1993. “User fees plus quality equals improved access
to health care: results of a field experiment in Cameroon.” Social Science & Medicine
37(3):369–383.
Madhani, Aamer. 2018. “Chicago police solved fewer than one in six homicides in the first
half of 2018.” USA Today . https://bit.ly/2Q3yt1u.
Moffitt, Susan L. 2010. “Promoting agency reputation through public advice: Advisory
committee use in the FDA.” The Journal of Politics 72(3):880–893.
nj.com. 2019. “The Force Report.”. https://force.nj.com/.
Orszag, Peter and Jim Nussle. 2017. “Policymaking commission offers a glimmer of hope in
hyper-partisan Washington.” The Hill . https://tinyurl.com/ry8hvax.
Yokum, David, Anita Ravishankar and Alexander Coppock. 2019. “A randomized control
trial evaluating the effects of police body-worn cameras.” Proceedings of the National
Academy of Sciences 116(21):10329–10332.
10
Which Police Departments Want Reform? Barriers toEvidence-Based Policymaking
Online Appendix
Contents
A1Sampling Procedure 1A1.1 Criteria for the N.J. and National Samples . . . . . . . . . . . . . . . . . . . 1A1.2 Matched Pair Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2A1.3 Protocol in the event of returned emails and letters . . . . . . . . . . . . . . 2
B2Experimental Protocol and Treatments 4B2.1 New Jersey Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5B2.2 New Jersey “Know more” website content . . . . . . . . . . . . . . . . . . . . 6B2.3 New Jersey Debrief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7B2.4 National Sample Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . 8B2.5 National “Know more” website content . . . . . . . . . . . . . . . . . . . . . 9B2.6 National Study Jersey Debrief . . . . . . . . . . . . . . . . . . . . . . . . . . 9
C3Data on Police Agencies 10C3.1 U.S. Census Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10C3.2 Coding of agency covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
D4Random assignment balance tests 12
E5Tabular Results 15E5.1 Geographic Distribution of Sample Agencies . . . . . . . . . . . . . . . . . . 15E5.2 Regression Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
F6Pre-registration plan for national experiment 23
G7Interaction Model Results 29
H8Sensitivity Analysis 31
H9Alternative Mechanisms 33
I10Alternative Coding 34
2
A1 Sampling Procedure
A1.1 Criteria for the N.J. and National Samples
The agencies contacted in the N.J. experiment were drawn from the list of agencies for whichuse-of-force data was tabulated by nj.com (nj.com, 2019). Excluding the N.J. State Police,which we also contacted, these 460 agencies represent close to 90% of local police and sheriff’sagencies in the state.
The research team manually matched the agencies with contact information purchasedfrom the National Public Safety Information Bureau (https://www.safetysource.com/index.cfm?). Roughly 150 agencies that did not list contact information were searchedonline or contacted by phone or email through the county clerk or agency itself, allowing usto contact all agencies. 1
The agencies contacted in the national experiment were drawn from data on agencieswhich report crime data to the FBI through the Uniform Crime Reporting (UCR) program,roughly 23,000 agencies between 2013 and 2017. Before assigning treatment, we filteredthese agencies according to several criteria. We excluded agencies: from New Jersey; withzero population in their jurisdiction according to the FBI; which were not local police andsheriff’s departments; which report crime data through another nearby “parent” agency;which reported a mean five-year (2013-2017) violence crime clearance rate that fell outsidethe interval [0,1] (those outside the interval likely contained serious data errors); and whichcleared zero crimes of any type in a given year, as these agencies are likely not reportingdata. We also screened out the Long Beach, CA Police Department from the nationalsample prior to random assignment due to the aforementioned mistake in the pilot study.After applying these filters, we were left with about 9,800 agencies from which to sample.We then set about the task of pairing these 9,800 agencies to a list of contact informationfor police chiefs and sheriffs purchased from Power Almanac (https://www.poweralmanac.com/). In hopes of minimizing undeliverable emails, we changed our data source becausePower Almanac updates the contact information on a continuing basis, with all agenciesreceiving updates twice annually. In the event a direct email was not available for a chiefor sheriff, we used general agency emails or contact information for a lower level employee.Power Almanac is a private firm that continuously updates contact information for municipalgovernment agencies. If agency contact information is not included in the Power Almanacdatabase it is very likely that the agency does not publicize its email address. Because of the
1After validating the contact data we dropped Long Beach Township, N.J. Police Department from thesample because we mistakenly contacted Long Beach, CA Police Department. We manually verified this wasthe only such mistake.
1
scale of the data, we employed the probabilistic matching algorithm outlined in Enamorado,Fifield and Imai (2017) using the R package fastLink to merge agency data with contactinformation. We employed this algorithm separately for police and sheriff’s agencies, andthen matched based on code, agency name and street address listed in the UCR data andPower Almanac data. Following the match, we discarded agencies which did not have at leasta 95% probability of being a correct match according to the model developed in Enamorado,Fifield and Imai (2017), as well as agencies for whom contact information could not beindependently obtained by the research team via web searches. Following these steps, 7,702agencies remained in sample, which oversee jurisdictions that are home to roughly 60% ofthe U.S. population according to FBI data. We drew 2,500 agencies from these 7,702 andassigned treatment based on the matched pair procedure we describe below.
A1.2 Matched Pair Design
To maximize statistical power, we implemented a matched pair design in the national ex-periment. The random assignment procedure went as follows. We first divided our sampleof roughly 7,700 agencies into quartiles based on the population sizes of their jurisdictions.We then random drew half of the desired sample (2,500 agencies/2) from this populationand assigned them to treatment. For each treated case, we identified a set of possible con-trol cases that belonged to the same population quartile and U.S. same state. If such a setexisted, we then randomly sampled one case from the possible control set, paired it withthe treated case, and assigned it to the control condition. The intent of the conditionalrandom assignment in this design, (as opposed to complete random assignment across thepooled sample) is that state and population size are prognostic of the dependent variable(response to our emails). By including indicators for these matched pairs during estimation,we thus sought to shrink the standard errors in our regressions by explaining variation inthe outcome Gerber and Green (2012).
A1.3 Protocol in the event of returned emails and letters
Emails were sent to the list of purchased and manually collected NJ email addresses. Manyemails were bounced back as undeliverable due to retirements, promotions or other organi-zational changes. In these cases, the research team made contact with either county clerksor police agencies again or searched for addresses online to obtain correct addresses. If ourmessages were blocked by SPAM filters, we relied on postal mail to deliver our treatments.In New Jersey there were no agencies who did not receive either our email or postal letter.
For the national study we sent an initial round of emails to the set of purchased email
2
addresses. When we received notice that an email address had changed or that a agencyhead had retired we updated our contact information and sent out a new round of emails.As with New Jersey, some emails were blocked by SPAM filters. There were three agencieswho we were unable to contact (by email or postal letter). These three agencies and theircorresponding matched agency were dropped from our analysis.
3
B2 Experimental Protocol and Treatments
1. Agencies were emailed on a Monday. Responses were recorded either directly through areply to our email message or by monitoring visits to our project website from speciallyconstructed links in each email.
2. Agencies were emailed on a Monday. Responses were recorded either directly through areply to our email message or by monitoring visits to our project website from speciallyconstructed links in each email.
3. Any agency that did not respond to the first email was sent a second email eight daysafter the first email (the following Tuesday).
4. Any agency that did not respond to the second email was sent a third email eight daysafter the second email (the following Wednesday).
5. A paper letter with content similar to the emails (direct links were replaced with aURL) was then sent to be mailed eight days after the third email to any agency fromwhom we had not received a response.
6. A debrief email was sent the week of 11/25/2019 to all contacted agencies. A mailedversion of the debrief was also sent.
Agencies could respond in three primary ways:
• Yes: Measured by clicking the yes link our emails or the yes button on our projectwebsite. Or by agreeing by direct email.
• No: Either explicitly measured through a no response on by link or on our website,or implicitly measured from a lack of a response to our messages. Or by declining bydirect email.
• “Learn more” : Measured by link or on our website. This page displayed additionalinformation about the project and offered an additional chance to say “yes” or “no” toour request. Or by asking for more information by direct email
We are currently in the process of following up with agencies who expressed interest in apossible collaboration and are in negotiations with several over the details of a joint project.Data collection for the national study ended the last week of November 2019.
4
B2.1 New Jersey Treatments
The following values were populated with data we collected and/or computed for each agency:
• $name: the name of the lead law enforcement officer
• $events: use of force incidents per officer between the years 2012 and 2016
• $rank: computed agency rank
In the control condition the ranking and confidentiality treatments were omitted. Inthe confidentiality condition the ranking treatment was omitted. In the ranking conditionthe confidentiality treatment was omitted. In the ranking and confidential condition bothtreatments were displayed in a random order.
5
Dear $name:
We are writing to invite you to collaborate on a research project conducted by researchers at <anonymizedfor review> and <anonymized for review> that aims to make both citizens and officers safer by reducingthe frequency of violence during police-citizen interactions.
In exchange for your participation in a research project we are offering to provide free consultingand data analysis services that we believe could benefit your agency. We would also be willing to helpyour agency conduct an opinion survey of citizens in your jurisdiction to help identify areas where youragency could improve its performance. We are not seeking funds and will work pro bono.
[Ranking treatment: According to publicly available data (more information) your agency had an averageof $events use of force incidents per officer between the years 2012 and 2016. By this metric, your agencyranked $rank out of 464 law enforcement agencies in New Jersey during this time (where a rank of 1 indicatesthe highest rate of using force).]
[Confidentiality treatment: If you agree to collaborate with us on a research project, the names of youragency and personnel will be kept confidential in any published research that we produce.]
We are not asking for a firm commitment now, but if you are interested in discussing a potentialcollaboration further, please click the appropriate link below and we will get in touch to continue theconversation.
• Yes, I am interested in discussing a research collaboration.
• I am not sure, but I would like more information
• No, I am not interested in discussing a research collaboration.
By way of background, we are non-partisan political scientists at <anonymized for review> and <anonymizedfor review> who received our PhDs from <anonymized for review>. We are trained in statistics, data anal-ysis, experimental design and policy evaluation. We are seeking to partner with law enforcement agenciesto help develop effective ways to improve police-citizen interactions, and are writing to gauge interest inconducting a collaborative research project with the aim of producing published, peer-reviewed researchthat could benefit the law enforcement community at large.
Please feel free to email us with any questions. Thank you for your consideration.
Sincerely,
<anonymized for review>
B2.2 New Jersey “Know more” website content
Thank you for your interestBy way of further introduction, we are faculty members in the Political Science/Government
departments at <anonymized for review> and <anonymized for review>. We specialize inquantitative research on public agency performance and of public opinion. We would be in-terested in collaborating with your agency to conduct any of the following types of research
6
projects, with the aim of publishing peer-reviewed research that can benefit your agency aswell as the law enforcement community at large.
A public opinion survey assessing public perceptions of your agency’s performance. Thissurvey may serve to help you identify areas in which citizens perceive your agency canimprove service.
A program evaluation of a new tactic. For example, we are interested in evaluating theeffects of:
• Police-worn body cameras on police use of force
• The use of tactical teams on crime and agency reputation
Response items:
Yes, I am interested in discussing a research collaboration.No, I am not interested in discussing a research collaboration.
B2.3 New Jersey Debrief
Dear $name:
We recently contacted your agency with an offer to discuss a possible research collabora-tion with faculty members at <anonymized for review> and <anonymized for review>. Weare writing to supply some additional information. Depending on the version of the emailyou saw, we may or may not have included information on your agency’s past performance(possibly relative to other agencies), or an offer of anonymity for your agency in any pub-lished findings. The purpose of this information was to gauge how various features of ourmessage affected response rates. However, we note that our emails contained no false infor-mation and your responses will be kept confidential. If you have any questions or concerns,please feel free to contact the principal investigators <anonymized for review>. You can alsocontact <anonymized for review> Institutional Review Board at <anonymized for review>.Sincerely,
<anonymized for review>
7
B2.4 National Sample Treatments
The following values were populated with data we collected and/or computed for each agency:
• $title: the title of the lead law enforcement officer
• $name: the name of the lead law enforcement officer
• $cleared: total cleared violent crimes occurring between 2013 and 2017
• $total: number of violent crimes between 2013 and 2017
• $rank: computed agency rank
In the control condition the ranking treatment was omitted. The full letter was sharedin the treatment condition.Dear $title $name:
We are writing to invite you to collaborate on a research project conducted by researchers at <anonymizedfor review> and <anonymized for review> that aims to study the efficacy of policing strategies designed toserve the interests of both officers and citizens.
In exchange for your participation in a research project we are offering to provide free consultingand data analysis services that we believe could benefit your agency. We would also be willing to helpyour agency conduct an opinion survey of citizens in your jurisdiction to help identify areas where youragency could improve its performance. We are not seeking funds and will work pro bono.
[Ranking treatment: According to publicly available data (more information) your agency cleared$cleared out of $total violent crimes between 2013 and 2017. By this metric, your agency ranked $rank outof approximately 2,500 law enforcement agencies we analyzed during this time, where a rank of 1 indicatesthe largest ("best") proportion of cleared violent crimes.]
We are not asking for a firm commitment now, but if you are interested in discussing a potentialcollaboration further, please click the appropriate link below and we will get in touch to continue theconversation.
• Yes, I am interested in discussing a research collaboration.
• I am not sure, but I would like more information
• No, I am not interested in discussing a research collaboration.
By way of background, we are non-partisan political scientists at <anonymized for review> and <anonymizedfor review> who received our PhDs from <anonymized for review>. We are trained in statistics, data anal-ysis, experimental design and policy evaluation. We are seeking to partner with law enforcement agenciesto help develop effective ways to improve police-citizen interactions, and are writing to gauge interest inconducting a collaborative research project with the aim of producing published, peer-reviewed researchthat could benefit the law enforcement community at large.
Please feel free to email us with any questions. Thank you for your consideration.
Sincerely,
<anonymized for review>
8
B2.5 National “Know more” website content
Thank you for your interestBy way of further introduction, we are faculty members in the Political Science/Government
departments at <anonymized for review> and <anonymized for review>. We specialize inquantitative research on public agency performance and of public opinion. We would be in-terested in collaborating with your agency to conduct any of the following types of researchprojects, with the aim of publishing peer-reviewed research that can benefit your agency aswell as the law enforcement community at large.
A public opinion survey assessing public perceptions of your agency’s performance. Thissurvey may serve to help you identify areas in which citizens perceive your agency canimprove service.
A program evaluation of a new tactic. For example, we are interested in evaluating theeffects of:
• Police-worn body cameras on police use of force
• The use of tactical teams on crime and agency reputation
Response options:
Yes, I am interested in discussing a research collaboration.No, I am not interested in discussing a research collaboration.
B2.6 National Study Jersey Debrief
Dear $title $name:
We recently contacted your agency with an offer to discuss a possible research collabora-tion with faculty members at <anonymized for review> and <anonymized for review>.Weare writing to supply some additional information. Depending on the version of the emailyou saw, we may or may not have included information on your agency’s past performance(possibly relative to other agencies). The purpose of this information was to gauge howvarious features of our message affected response rates. However, we note that our emailscontained no false information and your responses will be kept confidential. If you have anyquestions or concerns, please feel free to contact the principal investigators <anonymizedfor review>. You can also contact <anonymized for review> Institutional Review Board at<anonymized for review>.
9
Sincerely,
<anonymized for review>
C3 Data on Police Agencies
We merged data on contacted agencies with data on crime (Kaplan, 2019), fatal officer-involved shootings (Sullivan et al., 2018), agency personnel (DOJ, 2017), county-level electionresults (Data and Lab, 2018), and local U.S. Census records (Census, 2017).
C3.1 U.S. Census Data
We rely on U.S. Census data from the 2017 American Community Survey (Census, 2017).We sought to match agencies with demographic data measured at the geographic level oftheir jurisdictions. County agencies were paired with county census data, including parishesin Louisiana. Cities, towns, boroughs and other sub-county agencies were paired with eitherCensus Designated Place (CDP) data, or county subdivision Census data, depending onhow the Census classified localities. Five regional agencies which oversee multiple localitieswere paired with the sum (for count variables) or the means (for measures such as medianhousehold income) of Census data from those localities.
The analysis in Figure 2 is generated by separate bivariate regressions. The numberof observations varies due to partial missing data across covariates. With the exception ofcensus measures of percent of a jurisdiction living in poverty among the Asian American,“other race”, American Indian and native Hawaiian categories, all variables display less than10% missing data.
Table C1: Types of Census Data Paired with Agencies
county county subdivision place regional pdNo. of Agencies 616 449 1874 5
C3.2 Coding of agency covariates
10
Table C2: Coding Rules and Data Sources for Agency Covariates in Figure 2
variable coding/source% "Other" race non-Hispanic census data% "Other" race non-Hispanic in poverty census data% "Other" race unemployed census data% African American census data% African Americans in poverty census data% Asian census data% Asian above poverty line census data% Hispanic census data% Hispanics in poverty census data%Male census dataMedian household income census data% Multiracial in poverty census data% Multiracial non-Hispanic census data% Native American and Alaskan native census data% Native American and Alaskan native in poverty census data% Native Hawaiian and Pacific Islander census data% Native Hawaiian and Pacific Islander in poverty census data% without health insurance census dataPopulation census data% < 18 in poverty census data% > 25 with high school + census data% >25 with bachelors degree census dataPopulation density census data% foreign born census data% non-citizen foreign born census data% Vacant housing units census data% White census data% Whites in poverty census dataMedian age census dataMedian age of male population census dataMale civilian employees per capita employees in 2014 according to Law Enforcement Agency Roster (2016) count / total employeesShootings per capita fatal officer-involved shootings according to Washington Post (2018) / census population for jurisdictionMale officers per capita male sworn officers in 2014 according to Law Enforcement Agency Roster (2016) count / total employeesAssaults per capita mean assaults 2013-2017 (FBI) / census population for jurisdictionAttempted rapes per capita mean attempted rapes 2013-2017 (FBI) / census population for jurisdictionBurglaries per capita mean burglaries 2013-2017 (FBI) / census population for jurisdictionMean clearance for property crimes per capita mean cleared index property crimes 2013-2017 (FBI) / census population for jurisdictionMean clearance for violent crime per capita mean cleared index violent crimes 2013-2017 (FBI) / census population for jurisdictionMean clearance for murders per capita mean cleared murders 2013-2017 (FBI) / census population for jurisdictionMean shootings per capita mean fatal officer-involved shootings according to Washington Post (2015-2018) / census population for jurisdictionRape by force per capita mean forcible rape 2013-2017 (FBI) / census population for jurisdictionManslaughters per capita mean manslaughter 2013-2017 (FBI) / census population for jurisdictionVehicle thefts per capita mean motor vehicle thefts 2013-2017 (FBI) / census population for jurisdictionMurders per capita mean murders 2013-2017 (FBI) / census population for jurisdictionOfficers killed in an accident per capita mean officers accidentally killed 2013-2017 (FBI) / census population for jurisdictionOfficers assaulted per capita mean officers assaulted 2013-2017 (FBI) / census population for jurisdictionOfficers killed by felon per capita mean officers feloniously killed 2013-2017 (FBI) / census population for jurisdictionRapes per capita mean rapes 2013-2017 (FBI) / census population for jurisdictionRobberies per capita mean robberies 2013-2017 (FBI) / census population for jurisdictionAll crimes per capita mean total crimes 2013-2017 (FBI) / census population for jurisdictionTrump vote share (two party) MIT Election Data \& Science LabPolice employees per capita total employees 2014 according to Law Enforcement Agency Roster (2016) count / census population for jurisdiction
11
D4 Random assignment balance tests
12
Table D1: New Jersey Balance
Dependent variable:
Confidentiality Treatment Ranking Treatment Ranking and Confidentiality Treatment
(1) (2) (3)
Intercept 0.14 0.28 0.32(0.22) (0.23) (0.23)
Population −0.0001 −0.0000 0.0000(0.0000) (0.0000) (0.0000)
Median Age 0.004 −0.002 −0.0001(0.004) (0.004) (0.004)
Male population 0.0001∗ 0.0000 −0.0001(0.0001) (0.0001) (0.0000)
Foreign born population −0.0000 0.0000 −0.0000(0.0000) (0.0000) (0.0000)
(0.0000) (0.0000) (0.0000)
Total crimes 0.0001 −0.0000 0.0000(0.0001) (0.0001) (0.0001)
Officers assaulted −0.0002 −0.002 0.005(0.004) (0.003) (0.003)
White population 0.0000 0.0000 −0.0000(0.0000) (0.0000) (0.0000)
Native American/Alaskan Native population −7.97 −7.96 8.22(8.69) (8.62) (12.20)
Asian Population 0.31 −0.20 0.09(0.31) (0.24) (0.30)
African American population 0.13 0.37 −0.26(0.27) (0.28) (0.20)
Hawaiian and Pacific Islander population 15.29 −2.49 1.53(8.85) (7.34) (11.98)
Hispanic Population 0.20 −0.14 0.10(0.21) (0.19) (0.21)
Murder clearance rate −0.07 0.05 −0.03(0.04) (0.04) (0.03)
Violent crime clearance rate 0.004∗∗ −0.001 −0.001(0.001) (0.001) (0.001)
Property crime clearance rate −0.001 0.0004 −0.0002(0.0004) (0.0005) (0.0003)
Trump vote share (two party) −0.26 0.11 −0.11(0.19) (0.21) (0.19)
Observations 460 460 460R2 0.05 0.01 0.02F Statistic (df = 17; 442) 1.43 0.38 0.58
Note: ∗p<0.05; ∗∗p<0.01; ∗∗∗p<0.001
13
Table D2: National Study Balance
Dependent variable:
Assigned to Ranking Treatment
Intercept 1.51∗∗∗(0.08)
Population −0.0000(0.0000)
Median Age 0.0002(0.001)
Male population 0.0000(0.0000)
Foreign born population 0.0000(0.0000)
(0.0000)
Total crimes 0.00(0.0000)
Officers assaulted −0.001(0.0005)
White population −0.0000(0.0000)
Native American/Alaskan Native population −0.06(0.28)
Asian Population −0.03(0.30)
African American population −0.03(0.07)
Hawaiian and Pacific Islander population −0.35(2.84)
Hispanic Population 0.04(0.07)
Murder clearance rate 0.005(0.01)
Violent crime clearance rate −0.0000(0.0001)
Property crime clearance rate −0.0000(0.0001)
Trump vote share (two party) −0.0002(0.001)
Observations 2,484R2 0.002F Statistic 0.27 (df = 17; 2466)
Note: ∗p<0.05; ∗∗p<0.01; ∗∗∗p<0.001
14
E5 Tabular Results
E5.1 Geographic Distribution of Sample Agencies
15
Table E1: Figure 1: Sampled agencies by state
State Number of sampled agenciesAL 66AR 52AZ 24CA 92CO 42CT 34DE 4FL 74GA 94HI 2IA 54ID 32IN 32KS 70KY 58LA 32MA 88MD 24ME 36MI 118MN 78MO 116MS 10MT 22NE 16NC 74ND 10NH 32NJ 460NM 30NV 12NY 68OH 110OK 34OR 32PA 158RI 12SC 44SD 20TN 78TX 222UT 24VA 56VT 18WA 56WI 86WV 24WY 18AK 0IL 0
16
E5.2 Regression Estimates
Table E2: Estimates from Figure 2: Correlates of responding
Variable Estimate Uncorrected p-value Benjamini-Hochberg corrected p-valueOfficers assaulted per capita -8.39 0.01 0.39Population 0.19 0.91 0.94%Male 0.89 0.77 0.94% White -2.48 0.54 0.94% African American 2.97 0.37 0.94% Native American and Alaskan native 1.11 0.70 0.94% Asian 6.22 0.31 0.94% Native Hawaiian and Pacific Islander 1.83 0.85 0.94% "Other" race non-Hispanic -0.93 0.33 0.94% Multiracial non-Hispanic 3.42 0.48 0.94% Hispanic -2.31 0.29 0.94% > 25 with high school + -5.50 0.15 0.94% >25 with bachelors degree 5.85 0.20 0.94% "Other" race unemployed -0.49 0.87 0.94% Whites in poverty -4.85 0.13 0.94% African Americans in poverty 1.87 0.57 0.94% Native American and Alaskan native in poverty 3.56 0.30 0.94% Asian above poverty line -1.37 0.68 0.94% Native Hawaiian and Pacific Islander in poverty -3.21 0.36 0.94% "Other" race non-Hispanic in poverty -4.01 0.29 0.94% Multiracial in poverty -3.80 0.38 0.94% Hispanics in poverty -3.36 0.48 0.94% Vacant housing units -1.62 0.71 0.94Male officers per capita -4.42 0.36 0.94Male civilian employees per capita 3.22 0.41 0.94Officers killed by felon per capita -2.62 0.31 0.94Officers killed in an accident per capita -0.52 0.53 0.94Murders per capita 3.84 0.51 0.94Manslaughters per capita -1.23 0.73 0.94Rapes per capita 1.35 0.79 0.94Rape by force per capita 1.54 0.76 0.94Attempted rapes per capita -1.84 0.73 0.94Robberies per capita 0.02 0.99 0.99Assaults per capita 0.57 0.91 0.94Burglaries per capita -1.16 0.81 0.94Vehicle thefts per capita 1.53 0.67 0.94All crimes per capita 1.02 0.83 0.94Mean clearance for murders per capita 4.08 0.62 0.94Mean clearance for violent crime per capita -0.87 0.87 0.94Mean clearance for property crimes per capita 1.57 0.72 0.94Median age -1.33 0.64 0.94Median age of male population -1.05 0.72 0.94Median household income 1.88 0.73 0.94Trump vote share (two party) -8.46 0.13 0.94
17
Table E3: Estimates from Figure 3: New Jersey Regression Results
Dependent variable:
Agreed to Discuss Collaboration
(1) (2) (3)
Intercept 0.13∗∗∗ 0.21∗∗ 0.19∗∗(0.03) (0.07) (0.06)
Confidential Condition −0.02 −0.14 −0.09(0.04) (0.09) (0.08)
Ranking Condition −0.09∗ −0.17∗ −0.16∗(0.04) (0.08) (0.06)
Ranking and Anonymous Condition −0.04 −0.04 −0.02(0.04) (0.10) (0.09)
Numeric Rank −0.0003(0.0003)
Confidential Condition X Numeric Rank 0.001(0.0003)
Ranking Condition X Numeric Rank 0.0004(0.0003)
Ranking and Anonymous Condition X Numeric Rank 0.0000(0.0003)
Middle Rank Tercile −0.08(0.08)
Top Rank Tercile −0.10(0.08)
Confidential Condition X Middle Rank Tercile 0.08(0.11)
Ranking Condition X Middle Rank Tercile 0.11(0.09)
Ranking and Anonymous Condition X Middle Rank Tercile −0.02(0.11)
Confidential Condition X Top Rank Tercile 0.14(0.11)
Ranking Condition X Top Rank Tercile 0.13(0.09)
Ranking and Anonymous Condition X Top Rank Tercile −0.01(0.11)
Observations 462 462 462R2 0.01 0.02 0.03
Note: ∗p<0.05; ∗∗p<0.01; ∗∗∗p<0.001
Robust standard errors reported
18
Table E4: Estimates from Figure 3: National Study Regression Results
Dependent variable:
Agreed to Discuss Collaboration
(1) (2) (3) (4)
Intercept 0.15∗∗∗ 0.04∗∗∗ 0.01 0.01(0.01) (0.01) (0.04) (0.03)
Ranking Condition −0.08∗∗∗ −0.08∗∗∗ −0.07 −0.05(0.01) (0.02) (0.04) (0.04)
Numeric Rank 0.0000(0.0000)
Middle Rank Tercile 0.06(0.04)
Top Rank Tercile 0.04(0.04)
Ranking Condition X Numeric Rank −0.0000(0.0000)
Ranking Condition X Middle Rank Tercile −0.07(0.06)
Ranking Condition X Top Rank Tercile −0.03(0.06)
Matched Pair FE No Yes Yes YesObservations 2,484 2,484 2,484 2,484R2 0.02 0.52 0.52 0.53
Note: ∗p<0.05; ∗∗p<0.01; ∗∗∗p<0.001
19
Table E5: Trump Two Party Vote Share and Agreement
Dependent variable:
Agreed to Discuss Collaboration
Intercept 0.08(0.04)
Ranking Condition −0.09∗(0.04)
Middle Trump Vote Share Tercile 0.01(0.05)
Top Trump Vote Share Tercile −0.07(0.05)
Ranking Condition X Middle Trump Vote Share Tercile −0.03(0.06)
Ranking Condition X Top Trump Vote Share Tercile 0.05(0.05)
Matched Pair FE YesObservations 2,484R2 0.53
Note: ∗p<0.05; ∗∗p<0.01; ∗∗∗p<0.001
20
Figure E1: County Trump Vote Share Does Not Condition Response to Treat-ment. The figure shows the marginal effect of the agency performance cue in the nationalexperiment conditional on the two party Trump vote share in each county. To relax functionalform assumptions, the marginal effect was computed across the range of the moderator usingthe flexible kernel estimator recommended in Hainmueller, Mummolo and Xu (2019). Thestacked histogram along the bottom of the figure shows the distribution of treatment assign-ment at various levels of the moderator. The effect of the performance cue among countieswith low, medium and high levels of Trump support are statistically indistinguishable.
−0.25
−0.20
−0.15
−0.10
−0.05
0.00
25 50 75Trump Two−Party Vote % in County
Mar
gina
l Effe
ct o
f Ran
king
Tre
atm
ent o
nA
gree
ing
to D
iscu
ss C
olla
bora
tion
21
●● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●
●
−10
−5
0
5
10
% >
25
with
hig
h sc
hool
+Tr
ump
vote
sha
re (
two
part
y)M
ale
offic
ers
per
capi
ta%
Whi
tes
in p
over
tyO
ffice
rs a
ssau
lted
per
capi
taA
ttem
pted
rap
es p
er c
apita
% V
acan
t hou
sing
uni
ts%
Mul
tirac
ial i
n po
vert
yO
ffice
rs k
illed
by
felo
n pe
r ca
pita
Mea
n cl
eara
nce
for
viol
ent c
rime
per
capi
taB
urgl
arie
s pe
r ca
pita
Man
slau
ghte
rs p
er c
apita
% "
Oth
er"
race
non
−H
ispa
nic
in p
over
ty%
His
pani
cs in
pov
erty
Ass
aults
per
cap
ita%
His
pani
cR
obbe
ries
per
capi
taM
edia
n ag
eM
ean
clea
ranc
e fo
r m
urde
rs p
er c
apita
Med
ian
age
of m
ale
popu
latio
n%
Whi
teO
ffice
rs k
illed
in a
n ac
cide
nt p
er c
apita
% "
Oth
er"
race
non
−H
ispa
nic
% A
sian
abo
ve p
over
ty li
ne%
Nat
ive
Haw
aiia
n an
d P
acifi
c Is
land
er in
pov
erty
All
crim
es p
er c
apita
Mur
ders
per
cap
itaV
ehic
le th
efts
per
cap
itaP
opul
atio
n%
Afr
ican
Am
eric
an%
"O
ther
" ra
ce u
nem
ploy
ed%
Nat
ive
Haw
aiia
n an
d P
acifi
c Is
land
erR
apes
per
cap
ita%
Nat
ive
Am
eric
an a
nd A
lask
an n
ativ
e R
ape
by fo
rce
per
capi
taM
ean
clea
ranc
e fo
r pr
oper
ty c
rimes
per
cap
itaM
edia
n ho
useh
old
inco
me
%M
ale
% M
ultir
acia
l non
−H
ispa
nic
% A
fric
an A
mer
ican
s in
pov
erty
% A
sian
Mal
e ci
vilia
n em
ploy
ees
per
capi
ta%
Nat
ive
Am
eric
an a
nd A
lask
an n
ativ
e in
pov
erty
% >
25 w
ith b
ache
lors
deg
ree
Per
cent
age−
poin
t cha
nge
in a
gree
men
t fro
m a
one
sta
ndar
d de
viat
ion
incr
ease
in th
e pr
edic
tor
Benjamini−Hochberg corrected p−value ● P>.05
Response to collaboration discussion requests
Figure E2: Alternative Figure 2: Using Data from All Agencies
22
F6 Pre-registration plan for national experiment
Note: In our pre-analysis plan, we also planned on examining an alternate coding of the de-pendent variable in which agencies who indicated they were already collaborating with anotherresearch team were coded as responding affirmatively. We omit this specification because theseresponses only occurred twice.
EGAP Registry Form Schema
Note from EGAP: while the standard workflow is down, this form replaces the registra-tion form on egap.org. For this alternate workflow, the time/date that your email is sentwill become the timestamp for your registration. It may still take up to three business daysto review, upload, and post your submission, but the timestamp will be locked in as described.
B1 Title of Study – Determinants of Academic Collaborations with Law EnforcementAgencies
B2 Authors
<anonymized for review>B3 Acknowledgements –
B4 Is one of the study authors a university faculty member? – multiple choice (SELECTONE)Yes
B5 Is this Registration Prospective or Retrospective? – multiple choice (SELECTONE)
Registration prior to realization of outcomes
B6 Is this an experimental study? – multiple choice (SELECT ONE)Yes
B7 Date of start of study – 09/16/2019
23
B8 Gate date – 09/16/2020
B9 Was this design presented at an EGAP meeting? – multiple choice (SELECT ONE)No
B10 Is there a pre-analysis plan associated with this registration? – multiple choice (SE-LECT ONE)Yes
For the next three fields (C1-C3), the response box is a long answer plain text box. Pleasetry to limit your response to 300 words at most, and use your pre-analysis plan to elaboratefurther if necessary. Also, the plain text field limits formatting, so please do not includebullet point lists with multiple indentations, footnotes, tables, images, or other complicatedformatting.
C1 Background and explanation of rationale – Police agencies are increasingly partneringwith academic researchers to scientifically evaluate the effects of various police tactics andreforms in order to lower crime rates, improve the quality of police-citizen interactions andreduce the rate of police brutality, among other goals. But researchers have long suspectedthat the results of such collaborations suffer from selection bias: agencies which are in themost need of reform may be less likely to collaborate with researchers, while agencies thatare performing well may be eager to advertise their performance via academic collaborations.This study will use large-scale field experiments to evaluate whether such selection bias existsand if so, what causes it and how it can be mitigated.
C2 What are the hypotheses to be tested/quantities of interest to be estimated? –
We predict that there will be a relationship between an agency’s response and the agency’sperformance on measures of crime. We seek to measure the size of this bias. This is an ex-tension of a successful pilot study (conducted with roughly 450 agencies New Jersey). Weextend the design in two ways. First, we seek to replicate our findings with a different mea-sure of performance (violent crime clearance instead of use of force). Second, we seek toreplicate our results on a national sample.
This study will make use of an original email list of police administrators/agencies inthe United States. The list is an augmented version of the 2016 Law Enforcement AgencyRoster (LEAR), a federal survey of every law enforcement agency in the United States. The
24
list contains basic information on over 15,000 state and local agencies, including mailingaddresses, but does not include the names of administrators (police chiefs and sheriffs) oremail addresses. We have linked these data with the FBI’s Uniform Crime Report. Thisallows us to rank agencies on their violent crime clearance rate (the ratio of closed violentcrime cases to total violent crimes).
In our pre-test we identified an effect of our main treatment relative to our control,which alerted agencies to their statewide ranking on the measure of uses of force per officer.This treatment lowered response rates by approximately 8 percentage points. We anticipateanother negative effect by alerting agencies to violent crime clearance rates.
To maximize power we will generate a national sample of matched pairs of agencies, oneof which will be randomly assigned to receive the control version of our invitation and onewhich will receive the treatment version. These pairs will be matched on state and jurisdic-tion size (population, binned in quartiles). The universe of agencies will be local police andsheriff’s agencies (excluding those in U.S. territories) who recorded a violent crime clear-ance rate between 0 and 1 inclusive, and those which recorded non-missing data on neededvariables. Based on a power analysis, we will send messages to a national sample of approx-imately 2,500 agencies that will exclude NJ.
Following our pretest we will then send emails and (if needed) posted letters to the sam-pled agencies. The content of these emails/letters are attached.
We will also pair each agency with publicly available data on crime, police-involvedshootings and administrative data such as agency budgets and staff sizes in order to see howresponses to our email vary with these metrics for a purely descriptive analysis.
Experimental conditions:
1. Control message:
See attached.
2. Performance Treatment:
This condition add texts on the performance of the agency closing violent crime casesrelative to other American police departments. See attached.
25
Mode and protocol for contact:
We will consider two modes of contact: email and post.
We will send three emails spaced eight days apart. The first email will be sent on a Mon-day, then a Tuesday and finally on a Wednesday. To mitigate concerns that these emailswill be trapped by SPAM filters we will send the messages from a <anonymized for review>SMTP server and will space each message by a randomly drawn period [500 milliseconds -20,000 milliseconds].
If an agency does not respond to these emails we will send the content of the assignedemail by mail. The posted letter will be sent on <anonymized for review> letterhead.
Dependent measures:
There will be three primary dependent measures:
1. Each email/letter will include links indicating a willingness to participate in theproject.2. After clicking on the “yes” or “more information” links respondents will be taken to awebpage with additional information on the project. They will be able to send us text onwhat they would like to gain from a collaboration.3. Respondents can also directly respond to our emails/letters. We will record the text ofthese responses.
C3 How will these hypotheses be tested? –
1. We will compute the average treatment effect of our experimental intervention bycomparing the rate of affirmative responses among those in the treatment arm to the samerate in the control arm via OLS, conditional on a set of dummy variables for each matchedpair. Affirmative responses will be coded in two ways: 1) Those indicating “yes” and 2)Those either indicating “yes” or those who decline but indicate the reason is that they arealready collaborating with another institution on research.
2. We will estimate an interaction model which adds to the specification in 1. a variable
26
measuring how each agency ranks nationally on violent crime clearance rates as well as a mul-tiplicative term interacting this rank variable and the treatment indicator. Our expectationis that poorly ranked agencies will respond more negatively to treatment than well-rankedagencies, since the treatment will inspire greater concern over reputational damage amongthose low-ranked group. We will use the procedures described in Hainmueller, Mummoloand Xu (2019) to estimate this continuous interaction.
C4 Country – U.S.A.
C5 Sample Size ( of Units) – approximately 2,500
C6 Was a power analysis conducted prior to data collection? – multiple choice (SELECTONE)Yes
C7 Has this research received Institutional Review Board (IRB) or ethics committee ap-proval? – multiple choice (SELECT ONE)Yes
C8 IRB Number – <anonymized for review>
C9 Date of IRB Approval – 09/13/2019
C10 Will the intervention be implemented by the researcher or a third party? If a thirdparty, please provide the name. – multiple choice (SELECT AS MANY AS APPLICABLE)Researchers
C11 Did any of the research team receive remuneration from the implementing agencyfor taking part in this research? – multiple choice (SELECT ONE)N/A
C12 If relevant, is there an advance agreement with the implementation group that allresults can be published? – multiple choice (SELECT ONE)N/A
C13 JEL classification(s) – short answer; please provide alphanumeric code(s)
Methodology – select all that applyExperimental Design
27
Field ExperimentsStatisticsSurvey Methodology
Policy – select all that applyConflict and ViolenceGovernance
Certification – indicate agreementBy submitting this form and accompanying documents with EGAP, I confirm that I haverights to put this information in the public domain and I understand that this informationwill remain on the EGAP registry in perpetuity, regardless of whether the research is subse-quently implemented or not.
We agree to these terms.
Confirmation – indicate agreementYou should receive a confirmation of your registration within three business days. Yourregistration is considered complete only when confirmation is received. If you do not receiveconfirmation within three business days please contact [email protected]. Hitting SAVE at thebottom of this page will submit the registration. Please only do so when you are ready tosubmit. ONCE YOU HAVE HIT SAVE AT THE BOTTOM OF THIS PAGE PLEASEDO NOT HIT THE BACK BUTTON. Doing so creates multiple registrations, and we willdelete all but the most recent. If you accidentally created multiple registrations, please con-tact [email protected]
We agree to these terms.
Note from EGAP: while the standard workflow is down, this form replaces the registra-tion form on egap.org. For this alternate workflow, the time/date that your email is sentwill become the timestamp for your registration. It may still take up to three business daysto review, upload, and post your submission, but the timestamp will be locked in as described.
Additional Documentation – please attach your pre-analysis plan, survey instrument, orany other files associated with the registration (files must be under 5MB)
28
G7 Interaction Model Results
29
We estimated models that interacted treatment assignment in the national experimentwith agency rank on violent crime clearance rates. We exclude the N.J. sample from thisanalysis. Because ranks for the N.J. experiment were computed within the state, they arenot interchangeable with ranks in the nationwide sample. The interaction model would alsobe severely underpowered if we estimated it using only the N.J. sample.
−0.15
−0.10
−0.05
0.00
0 500 1000 1500 2000 2500
Department Rank
Mar
gina
l Effe
ct o
f Ran
king
Tre
atm
ent
on A
gree
ing
to D
iscu
ss C
olla
bora
tion
Figure G1: Agency Performance Does Not Condition Response to Treatment. Thefigure shows the marginal effect of the agency performance cue in the national experimentconditional on each agency’s rank on violent crime clearance rates among the ≈ 2,500 agen-cies contacted (Hainmueller, Mummolo and Xu, 2019). The stacked histogram shows thedistribution of treatment across levels of the moderator. The treatment effect among low,medium and high performing agencies are statistically indistinguishable.
30
H8 Sensitivity Analysis
31
Figure H1: No Single State Drives Our Results. This figures shows estimates derivedfrom a dataset that iteratively drops one of the states included in the sample. Results areconsistent across the 47 models.
0.0
2.5
5.0
7.5
10.0
−9.0 −8.5 −8.0 −7.5 −7.0
Estimated Coefficient
Cou
nt
Response to collaboration discussion requests:Sensitivity to Dropping Single States
32
H9 Alternative Mechanisms
One drawback of our research design is that we are unable to observe how agencies wouldhave responded had they been contacted by researchers from institutions with differentreputations. Our research team is drawn from multiple institutions, alleviating concernsthat one particular university’s reputation is driving results. But in the absence of morefamiliarity with the research team, our invitations, and our performance interventions inparticular, could have been viewed with high levels of suspicion, prompting concerns thatwe lacked objectivity. This interpretation of our invitations would still be broadly consistentwith our reputational theory—agencies decline collaborations for fear of negative publicity—albeit through a different channel than intended.
This concern was reinforced when we received an email from an agency in a very conser-vative state declining our invitation, which stated:
“I have spoken to my administrative staff and all agree we should not participatein your study. Some had reservations on both colleges being ‘liberal’ schools.Thank you.”
For this agency, at least, the partisan reputation of our universities (or perhaps, academia ingeneral) precluded collaboration. If this interpretation were widespread, we would interpretour results quite differently.
This is a reasonable concern, but we do not think the perceived partisanship of the re-search team is producing these results. If it were, we expect agencies from more conservativeareas would decline our invitations at higher rates. But as Figure ?? shows, county-levelelectoral support for Donald Trump does not predict response (β = −2.20, p = 0.85). Ina separate analysis, we interact treatment in the national experiment with Trump’s countyvote share, and find no evidence of heterogeneous effects (see Online Appendix Figure E1and Table E5).2
These results suggest our findings are not simply due to the perceived partisanship ofthe research team. The fact that even agencies told they were performing superbly—acue unlikely to engender mistrust—reacted negatively, also suggests performance evaluationsuppresses responses in general. Still, we acknowledge there exist multiple ways in which ourmessages may have triggered reputational concerns that we cannot distinguish. We discussthe implications of these various mechanisms in the following section.
2This analysis was not pre-registered, but we conduct it anyway in response to aforementioned email andfeedback we received from colleagues concerning alternative mechanisms.
33
I10 Alternative Coding
In the national study a single agency responded by email in a way that could either beinterpreted as a “yes” or a request for additional information. Out of an abundance ofcaution, we coded this agency as a “no” in our primary analysis. Below we show that codingthis agency as a “yes” has no meaningful effect on our results.
Table I1: Alternate Coding for National Study
Agreed to Discuss Collaboration
(1) (2) (3) (4)
Intercept 0.15∗∗∗ 0.04∗∗∗ 0.01 0.005(0.01) (0.01) (0.04) (0.03)
Ranking Condition −0.08∗∗∗ −0.08∗∗∗ −0.06 −0.04(0.01) (0.02) (0.04) (0.04)
Numeric Rank 0.0000(0.0000)
Middle Rank Tercile 0.07(0.04)
Top Rank Tercile 0.04(0.04)
Ranking Condition X Numeric Rank −0.0000(0.0000)
Ranking Condition X Middle Rank Tercile −0.07(0.06)
Ranking Condition X Top Rank Tercile −0.03(0.06)
Matched Pair FE No Yes Yes YesObservations 2,484 2,484 2,484 2,484R2 0.02 0.52 0.52 0.52
Note: ∗p<0.05; ∗∗p<0.01; ∗∗∗p<0.001
34
References
Census, U.S. 2017. Selected characteristics, 2012-2017 5-Year American Community Survey.Technical report U.S. Census Bureau.
Data, MIT Election and Science Lab. 2018. “County Presidential Election Returns 2000-2016.”.
DOJ. 2017. Law Enforcement Agency Roster (LEAR), 2016. Technical report United StatesDepartment of Justice, Office of Justice Programs, Bureau of Justice Statistics Ann Arbor,MI: Inter-university Consortium for Political and Social Research [distributor]: . https://doi.org/10.3886/ICPSR36697.v1.
Enamorado, Ted, Benjamin Fifield and Kosuke Imai. 2017. “Using a Probabilistic Model toAssist Merging of Large-scale Administrative Records.” Working Paper .
Gerber, Alan S. and Donald P. Green. 2012. Field experiments: Design, analysis, andinterpretation. WW Norton.
Hainmueller, Jens, Jonathan Mummolo and Yiqing Xu. 2019. “How Much Should We TrustEstimates from Multiplicative Interaction Models? Simple Tools to Improve EmpiricalPractice.” Political Analysis 27(2):163–192.
Kaplan, Jacob. 2019. Jacob KaplanâĂŹs Concatenated Files: Uniform Crime ReportingProgram Data: Offenses Known and Clearances by Arrest, 1960-2017. Technical reportInter-university Consortium for Political and Social Research [distributor] Ann Arbor, MI:.
nj.com. 2019. “The Force Report.”. https://force.nj.com/.
Sullivan, John, Zane Anthony, Julie Tate and Jennifer Jenkins. 2018. “Nationwide, policeshot and killed nearly 1,000 people in 2017.” The Washington Post . https://wapo.st/
2Q5AAC7.
35