administrative records mask racially biased policing · 2020. 8. 29. · this context, and...

47
American Political Science Review (2020) 114, 3, 619--637 doi:10.1017/S0003055420000039 © American Political Science Association 2020 Administrative Records Mask Racially Biased Policing DEAN KNOX Princeton University WILL LOWE Hertie School of Governance JONATHAN MUMMOLO Princeton University R esearchers often lack the necessary data to credibly estimate racial discrimination in policing. In particular, police administrative records lack information on civilians police observe but do not investigate. In this article, we show that if police racially discriminate when choosing whom to investigate, analyses using administrative records to estimate racial discrimination in police behavior are statistically biased, and many quantities of interest are unidentiedeven among investigated individu- alsabsent strong and untestable assumptions. Using principal stratication in a causal mediation framework, we derive the exact form of the statistical bias that results from traditional estimation. We develop a bias-correction procedure and nonparametric sharp bounds for race effects, replicate published ndings, and show the traditional estimator can severely underestimate levels of racially biased policing or mask discrimination entirely. We conclude by outlining a general and feasible design for future studies that is robust to this inferential snare. C oncern over racial bias in policing, and the public availability of large administrative data sets documenting policecivilian interactions, have prompted a raft of studies attempting to quantify the effect of civilian race on law enforcement behavior. These studies consider a range of outcomes including ticketing, stop duration, searches, and the use of force (e.g., Antonovics and Knight 2009; Fryer 2019; Ridgeway 2006; Nix et al. 2017). Most research in this area attempts to adjust for omitted variables that may correlate with suspect race and the outcome of interest. In contrast, this study addresses a more fundamental problem that remains even if the vexing issue of omitted variable bias is solved: the inevitable statistical bias that results from studying racial discrimination using records that are themselves the product of racial discrimination (Angrist and Pischke 2008; Elwert and Winship 2014; Rosenbaum 1984). We show that when there is any racial discrimination in the decision to detain civiliansa decision that determines which encounters appear in police administrative data at allthen estimates of the effect of civilian race on subsequent police behavior are biased absent additional data and/or strong and untest- able assumptions. This study makes several contributions. We clarify the causal estimands of interest in the study of racially discriminatory policingquantities that many studies appear to be targeting, but are rarely made explicitand show that the conventional approach fails to recover any known causal quantity in reasonable settings. Next, we highlight implicit and highly implausible assumptions in prior work and derive the statistical bias when they are violated. We proceed to develop informative nonparametric sharp bounds for the range of possible race effects, apply these in a reanalysis and extension of a prominent article on police use of force (Fryer 2019), and present bias-corrected results that suggest this and similar studies drastically underestimate the level of racial bias in policecivilian interactions. Finally, we outline strategies for future data collection and re- search design that can mitigate these threats to in- ference. These are discussed in the context of a detailed and feasible proposed study of racial bias in trafc stops. As we show in this article, the difculty of estimating racial bias using police records stems from a thorny combination of mediation (Hern ´ an, Hern ´ andez-Di ´ az, and Robins 2004; Imai et al. 2011; Pearl 2001; Robins, Hern ´ an, and Brumback 2000; VanderWeele 2009) and selection (Heckman 1979; Lee 2009): the effect of ci- vilian race on the outcome of a police encounter is me- diated by whether the civilian is stopped by police, but analysts only have data for one level of the mediatorthat is, data on stopped individuals. Because of this, police records do not contain a representative sample of all individuals that police observe, but rather only those civilian encounters which escalated to the point of triggering a reporting requirement. If a civilians race affects whether ofcers choose to stop that civilian (Gelman, Fagan, and Kiss 2007; Glaser 2014), then analyzing administrative police records amounts to conditioning on a variable that is itself affected by suspect race, namely, whether a suspect appears in the data at all. This could occur if ofcers have a higher Dean Knox , Assistant Professor of Politics, Princeton University, [email protected]. Will Lowe , Senior Research Scientist, Hertie School of Gov- ernance, [email protected]. Jonathan Mummolo , Assistant Professor of Politics and Public Affairs, Princeton University, [email protected]. We thank Matt Blackwell, Chuck Cameron, Tom Clark, Scott Cunningham, Lauren Davenport, Naoki Egami, Jeffrey Fagan, Avi Feller, Adam Glynn, Phillip Atiba Goff, Justin Grimmer, Andy Hall, Anna Harvey, Dan Hopkins, Matias Iaryczower, Kosuke Imai, Da- mon Jones, Dorothy Kronick, Shiro Kuriwaki, Neil Malhotra, Moritz Marbach, Nolan McCarty, Cyrus Samii, Maya Sen, Tara Slough, Rocio Titiunik, Tyler VanderWeele, Vesla Weaver, and Sean Westwood for helpful feedback. We thank Michael Pomirchy for research assistance. Replication les are available at the American Political Science Review Dataverse: https://doi.org/10.7910/DVN/ KFQOCV. Received: April 26, 2019; revised: October 10, 2019; accepted: January 8, 2020. 619 Downloaded from https://www.cambridge.org/core . IP address: 68.82.189.82 , on 29 Jul 2020 at 14:44:29, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms . https://doi.org/10.1017/S0003055420000039

Upload: others

Post on 26-Nov-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

American Political Science Review (2020) 114, 3, 619--637

doi:10.1017/S0003055420000039 © American Political Science Association 2020

Administrative Records Mask Racially Biased PolicingDEAN KNOX Princeton University

WILL LOWE Hertie School of Governance

JONATHAN MUMMOLO Princeton University

Researchers often lack the necessary data to credibly estimate racial discrimination in policing. Inparticular, police administrative records lack information on civilians police observe but do notinvestigate. In this article, we show that if police racially discriminate when choosing whom to

investigate, analyses using administrative records to estimate racial discrimination in police behavior arestatistically biased, and many quantities of interest are unidentified—even among investigated individu-als—absent strong and untestable assumptions. Using principal stratification in a causal mediationframework, we derive the exact form of the statistical bias that results from traditional estimation. Wedevelop a bias-correction procedure and nonparametric sharp bounds for race effects, replicate publishedfindings, and show the traditional estimator can severely underestimate levels of racially biased policing ormask discrimination entirely. We conclude by outlining a general and feasible design for future studies thatis robust to this inferential snare.

Concern over racial bias in policing, and the publicavailability of large administrative data setsdocumenting police–civilian interactions, have

prompted a raft of studies attempting to quantify theeffect of civilian race on law enforcement behavior.These studies consider a range of outcomes includingticketing, stop duration, searches, and the use of force(e.g., Antonovics and Knight 2009; Fryer 2019;Ridgeway 2006; Nix et al. 2017). Most research in thisarea attempts to adjust for omitted variables that maycorrelate with suspect race and the outcome of interest.In contrast, this study addresses a more fundamentalproblem that remains even if the vexing issue of omittedvariable bias is solved: the inevitable statistical bias thatresults fromstudying racial discrimination using recordsthat are themselves the product of racial discrimination(Angrist and Pischke 2008; Elwert and Winship 2014;Rosenbaum 1984). We show that when there is anyracialdiscrimination in thedecision todetain civilians—adecision that determines which encounters appear inpolice administrative data at all—then estimates of theeffect of civilian race on subsequent police behavior are

biased absent additional data and/or strong and untest-able assumptions.

This study makes several contributions. We clarifythe causal estimands of interest in the study of raciallydiscriminatory policing—quantities that many studiesappear tobe targeting, but are rarelymadeexplicit—andshow that the conventional approach fails to recover anyknown causal quantity in reasonable settings. Next, wehighlight implicit and highly implausible assumptionsin prior work and derive the statistical bias when theyare violated. We proceed to develop informativenonparametric sharp bounds for the range of possibleraceeffects, apply these ina reanalysis andextensionofa prominent article on police use of force (Fryer 2019),and present bias-corrected results that suggest this andsimilar studies drastically underestimate the level ofracial bias in police–civilian interactions. Finally, weoutline strategies for future data collection and re-search design that can mitigate these threats to in-ference.Thesearediscussed in the context of adetailedand feasible proposed study of racial bias in trafficstops.

As we show in this article, the difficulty of estimatingracial bias using police records stems from a thornycombination of mediation (Hernan, Hernandez-Diaz,and Robins 2004; Imai et al. 2011; Pearl 2001; Robins,Hernan, and Brumback 2000; VanderWeele 2009) andselection (Heckman 1979; Lee 2009): the effect of ci-vilian race on the outcome of a police encounter is me-diated by whether the civilian is stopped by police, butanalystsonlyhavedataforonelevelof themediator—thatis, data on stopped individuals. Because of this, policerecords do not contain a representative sample of allindividuals that police observe, but rather only thosecivilian encounters which escalated to the point oftriggering a reporting requirement. If a civilian’s raceaffects whether officers choose to stop that civilian(Gelman, Fagan, and Kiss 2007; Glaser 2014), thenanalyzing administrative police records amounts toconditioning on a variable that is itself affected bysuspect race, namely, whether a suspect appears in thedata at all. This could occur if officers have a higher

Dean Knox , Assistant Professor of Politics, Princeton University,[email protected].

Will Lowe , Senior Research Scientist, Hertie School of Gov-ernance, [email protected].

Jonathan Mummolo , Assistant Professor of Politics and PublicAffairs, Princeton University, [email protected].

We thank Matt Blackwell, Chuck Cameron, Tom Clark, ScottCunningham, Lauren Davenport, Naoki Egami, Jeffrey Fagan, AviFeller, AdamGlynn, Phillip Atiba Goff, Justin Grimmer, Andy Hall,Anna Harvey, Dan Hopkins, Matias Iaryczower, Kosuke Imai, Da-mon Jones, Dorothy Kronick, Shiro Kuriwaki, Neil Malhotra, MoritzMarbach, Nolan McCarty, Cyrus Samii, Maya Sen, Tara Slough,Rocio Titiunik, Tyler VanderWeele, Vesla Weaver, and SeanWestwood for helpful feedback. We thank Michael Pomirchy forresearch assistance. Replication files are available at the AmericanPolitical Science Review Dataverse: https://doi.org/10.7910/DVN/KFQOCV.

Received:April 26, 2019; revised:October 10, 2019; accepted: January8, 2020.

619

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 2: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

threshold for stopping white civilians during the un-seen first stage of police–civilian contact, meaning thatwhite civilians observed in the data are incomparablebecause they tend to pose a greater threat to policethan observed minorities. These unobserved differ-ences can lead analysts to understate anti-minorityracial bias—or even produce the appearance of anti-white bias—in the use of force. Despite claims to thecontrary (Fryer 2018, 2), this statistical bias oftencannot be eliminatedwith additional control variables,even if the goal is to estimate causal effects among thesubset of police–civilian encounters that appear inpolice data. Moreover, the problem remains whetherracial bias in detainment stems from so-called “taste-based” or “statistical” discrimination (Arrow1972, seebelow for extended discussion on this point).

At the first glance, the problem of race-based se-lection into policing data may appear a classic case ofsample selection bias (Elwert and Winship 2014;Heckman 1979) for which numerous remedies alreadyexist. But policing data exhibit a constellation of fea-tures that render previous methodological approachesunsuitableorunusable in this setting, leadingprominentscholars in this area to declare that “it is unclear how toestimate the extent of such bias or how to address itstatistically,” (Fryer 2018, 5).1 For example, Heckman(1979) and more recent extensions like Lee (2009)provide methods for estimating or bounding averagetreatment effects in the populationwhile accounting forsample selection. But with only data on stopped indi-viduals, policing scholars rarely seek to estimate pop-ulation treatment effects, instead targeting effectsamong individualswho actually interactwith police.Weshow that even without attempting to generalize to thebroader population, the issues we raise result in biasedestimates of the effect of race on police behavior evenamong encounters in which civilians are detained.

A related large literature provides remedies for so-called “post-treatment bias”—statistical bias thatresults from conditioning on a variable that is affectedby the causal variable of interest (Rosenbaum 1984).But implementation of these techniques requires eitherknowledge of the scale of the missing data (e.g., Nyhan,Skovron, and Titiunik 2017) or complete data on theposttreatment variable (e.g., Acharya, Blackwell, andSen 2016).2 In the case of policing, administrative datasets only include observations with one level of theposttreatment variable (i.e., data on stopped individu-als) and give no purchase on the number of individualspolice observe but do not stop, meaning these techni-ques cannot be applied. This scenario also differs from

situations of “truncation by death” (Frangakis andRubin 2002) in which receipt of a treatment causessample attrition and renders outcomes for someportionof units undefined. In the policing setting, individualsnot detained by police are absent from the data, butmany outcomes of interest are often still defined (e.g.,the level of force applied to nonstopped individuals iszero, a realized outcome). This feature allows us toidentify additional causal quantities that cannot be re-covered in the “truncation by death” setting. In short,existing methods offer either unusable or suboptimalsolutions to this pernicious threat to inference, absentstrong assumptions about the unseen process mappingcivilian race to officers’ decisions to detain individuals.

Our analysis indicates that existing empirical work inthis area is producing a misleading portrait of evidenceas to the severity of racial bias in police behavior.Replicating and extending the study of police behaviorin New York in Fryer (2019), we show that the con-sequences of ignoring the selective process that gen-erates police data are severe, leading analysts todramatically underestimate or conceal entirely thedifferential police violence faced by civilians of color.For example, while a naıve analysis that assumes norace-based selection into the data suggests only 10,000blackandHispanic civilianswerehandcuffedbecauseofracial bias inNewYorkCity between 2003 and 2013, weestimate that the true number is approximately 56,000.And while analyses ignoring bias in stopping wouldconclude that 10% of uses of force against black andHispanic civilians in these data were discriminatory,after bias-correction, we estimate that the true per-centage is 39%.

While the techniques used to obtain our correctedresults eliminate several facially implausible (and insome cases, empirically falsified) assumptions that areimplicit in prior work, we caution that they neverthelessrely on weaker assumptions that in some cases aredifficult to verify, as we discuss below. We seek to ad-vance the study of racial bias in policing by explicitlystating theseassumptions, discussing their plausibility inthis context, and carefully grounding unobservableparameters—in particular, the proportion of raciallydiscriminatory minority stops, which relates closely tothe severity of the statistical bias—in prior research(Gelman, Fagan, and Kiss 2007; Goel, Rao, and Shroff2016). We show that obtaining more precise bias-cor-rected estimates of racial discrimination in policingrequires future research to be designedwith this issue inmind. To that end, we outline a research design thatalleviates these concerns. Our study also providesa general framework for analyzing the study of racialbias that can illuminate the causal interpretation ofother longstanding tests for discrimination. For exam-ple, we show that under reasonable assumptions, so-called “outcome tests,” which compare the rates offinding evidence of criminal activity across detainedsuspects of different racial groups (Knowles, Perisco,and Todd 2001), imply a lower bound on the share ofracial minorities who are discriminatorily detained.Outcome tests also appear elsewhere in criminal justicestudies, for example, in capital sentencing (Alesina and

1 This comment was made in reference to an analysis of arrest data inFryer (2019). Further, Fryer (2019) includes an analysis aimed atcharacterizing selection into police data sets, and finds mixed resultsdepending on the outcome examined. The study states: “Taken to-gether, this evidence demonstrates how difficult it is to understandwhether there is potential selection into police data sets…Solving thisis outside the scope of this paper,” (19).2 In addition, the remedy proposed in Blackwell (2013), whichrequires re-weighting across all strata of the post-treatment variable,cannot be implemented in the situation we describe. However, thealternative designs we propose below are amenable to this approach.

Dean Knox, Will Lowe, and Jonathan Mummolo

620

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 3: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Ferrara 2014) and bail decisions (Arnold, Dobbie, andYang 2018). And as Ayres (2002) and Simoiu, Corbett-Davies, and Goel (2017) note, such tests have alsobeen applied in a range of other social contexts, in-cluding financial lending and editorial decisions. Bynesting the study of discrimination in a rigorous andgeneral causal framework, our study can help synthe-size results from a broad interdisciplinary literature onracial bias.

Ourworkalsoextendsagrowing literature inpoliticalscience examining the political implications of law en-forcement which, in recent decades, has largely studiedpolicing indirectly, for example, as a means ofexplaining political participation (Burch 2013; Cohenet al. 2017; Lerman andWeaver 2014;White 2019) or asan instance of bureaucracy (Brehm and Gates 1999;Lipsky 1980; Ostrom andWhitaker 1973;Wilson 1989).This work is path breaking, but with some recentexceptions (Harvey and Mungan 2019; Magaloni,Franco, and Melo 2015; Mummolo 2018a; Peyton et al.2019; Soss and Weaver 2017), has tended to concep-tualize policing as a cause of politics, rather than a po-litical act in and of itself. The field’s relative inattentionto policing was made evident by several recent officer-involved shootings of unarmed black men (Edwards,Lee, and Esposito 2019) and subsequent social unrestthat caught many political scientists flatfooted, withlittle systematic evidence to offer as the demand forexplanations of police behavior surged. As Soss andWeaver (2017) note, the field’s limited store of relevantknowledge in the aftermath of these events wasespecially glaring given law enforcement’s role as aneveryday conduit of state power. According to oneoften-cited definition, politics is “who gets what, when,how” (Lasswell 1936). As a matter of routine, the dy-namics of police-civilian interactions determine whogets protected, punished, or left to fend for themselves(Wilson 1968).Viewed in thisway, the role of race in thestate’s exercise of violence, as well as in the provision ofsafety more broadly, is inherently political (Alexander2010; Gottschalk 2008; Key 1949). In addition to of-feringa rigorous analytic framework tohelp researcherscontendwith longstandingmethodological hurdles, ourstudy also underscores an often overlooked truth: po-licing is high-stakes politics.

CONCEPTUALIZING RACE AS ACAUSAL VARIABLE

We regard the investigation of racial bias in policing asan inherently causal inquiry, albeit a notoriously diffi-cult one. That is, researchers seek to assess whetherpolice behavior during police–civilian encounterswould have differed if the civilian had belonged toanother racial group, holding constant civilian behaviorand circumstances. As noted in Fryer (2018), this “‘raceeffect’…is the proverbial ‘holy grail’—the parameterthat we are all attempting to estimate but never quitedo” (2). This task is distinct from the descriptive en-terprise of merely documenting differential police be-havior during encounters with various groups, as such

disparities can arise via numerous processes that do notimply racial discrimination.3

The notion of a “causal effect of race” on an indi-vidual’soutcome is the subject ofmuchcontention in theliterature on causal inference (Hernan 2016; Pearl2018).Most notably, somehave argued that this effect isundefined because race is an immutable, and hencenonmanipulable, characteristic (Holland 1986). Othersargue that an individual’s race is a complex, multifac-eted treatment—a “bundle of sticks,” in the words ofSen andWasow (2016)—that affects outcomes throughmyriad channels, and therefore, researchers must beprecise about the specific facets of race under consid-eration (Greiner and Rubin 2011).

Our analysis avoids this debate by focusing onpolice–civilian encounters—that is, sightings of civiliansby police—as the unit of analysis, rather than individ-uals. The manipulation of race is conceptualized as thecounterfactual substitution of an individual with a dif-ferent racial identity into the encounter, while holdingthe encounter’s objective context—location, timeof day, criminal activity, etc.—fixed. In other words, the“treatment” in this case is the entire “bundle of sticks”encapsulating the race of the civilian—including, forexample, skin tone, dialect, and clothing. We note thatthe credibility of causal inferences and the exact in-terpretation of racial discrimination in this frameworkwill depend crucially on how the analyst defines “race.”We leave the specific operationalization in a givencontext to the analyst, and, in linewith advice in SenandWasow (2016), encourage scholars to carefully conveytheir conceptualization of race when studying this andrelated questions.4

By conceptualizing the treatment in this way, weavoid consideration of the perhaps implausible coun-terfactual of holding all features of an individual con-stant but for their race. While various aspects of racialidentity and its close correlates may not be separable inthe observed world, there exists a subset of comparablesituations in which minority and majority civilians areobserved by police. If this subset can be identified, orapproximated through covariate adjustment, we canestimate the counterfactual police behavior that wouldhaveoccurredhad the civilian inquestionbeen replacedwith a member of another racial group.

While our approach considers a valid counterfactualand isolates racial discrimination that occurs duringpolice–civilian encounters, it necessarily mutes the in-fluence of pre-encounter macroinstitutional factors,suchasdecisions todeploymoreofficers to communitiesof color. In keepingwith the goals of prior studies in this

3 For example, we may observe that members of one racial group arestopped more often by police than members of another racial group.While this result shows disparate police behavior, it does not con-clusivelydemonstrate that thedifference isdue tocivilianrace. It couldsimply be the case that members of the first group participate incriminal activity more often in public.4 Note that while the unit of analysis is the police-civilian encounter,for the sake of brevity, we occasionally refer to “minority civilians” asshorthand for “police-civilian encounters with minority civilians” insubsequent discussion. Readers are cautioned to keep this distinctionin mind.

Administrative Records Mask Racially Biased Policing

621

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 4: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

area, our approach holds such contextual featuresconstant, allowing us to ask whether an encounterwould have unfolded differently had it involved a ci-vilian of differing race. But even if no such differenceexists within encounters, law enforcement strategiesadopted before encounters occur could still produceracially biased policing.We caution readers to keep thisscope condition in mind.

PRIOR RESEARCH ON RACIAL BIASIN POLICING

Race-based selection into policing data has been pre-viously noted, and some scholars have devised researchdesigns in an attempt to sidestep this issue.Grogger andRidgeway (2006), for example, leverage the so-called“veil of darkness” strategy, comparingpatterns in trafficstops that occur before and after sunset under thelogic that the race of the driver is plausibly hidden topolice officers after dark. In this way, the study aims toidentify a sample of police–civilian interactions thatwere initiated in a race-blind manner. Similarly,West(2018) examines data on police responses to trafficincidents, arguing that whether a co-racial officerresponds to amotorist’s unanticipated accident is as-ifrandom. If the assumptions in these studies hold,concerns over race-based sample selection are greatlyalleviated.

These attempts to mitigate race-based selection re-main rare, as most empirical studies in this literaturefocus nearly exclusively onmitigating themore familiarproblem of omitted variable bias. For example, Fryer(2019) (detailed below), a study of racial bias in policeviolence, estimates discrimination using data onpolice–civilian encounters via multivariate regressionsthat control for a host of observables relating to civil-ians, officers, and circumstance. In a related article, theauthor asserts that “regression can recover the ‘raceeffect’ if race is ‘as good as randomly assigned,’ con-ditional on the covariates” (Fryer 2018, 2). Fryer (2019)claims to find evidence of bias in sublethal force butnone in lethal encounters.

A related study, Johnson et al. (2019), attempts toestimate racial bias in police shootings. Examining onlypositive cases in which fatal shootings occurred, theyfind that the majority of shooting victims are white andconclude fromthis thatnoantiminoritybiasexists.KnoxandMummolo (2020) show that this conclusion rests onthe erroneous assumption that police encounter mi-nority and white civilians in equal number.

Prior work has also examined racial bias in trafficenforcement, such as Ridgeway (2006) which employspropensity score weighting when estimating racial biasin traffic stops in Oakland, CA. The analysis examinesoutcomes including citations, stop duration, and thedecision to search cars. The study claims thisreweighting strategy can recover “the causal effect ofrace” (9) on poststop outcomes. In general, the analysisfinds little evidence of racial bias on most outcomes,with the exception of stop duration. Antonovics andKnight (2009) use data on traffic citations from the

Boston Police Department to estimate the probabilitythat a ticketed driver was searched, controlling fordriver attributes such as age, race, and gender as well asneighborhood traits. They interpret the coefficient onan indicator of whether the officer and ticketed driverare of different races as an estimate of “racial profilingbased on prejudice,” as opposed to statistical discrim-ination (167). The claim is implicitly causal: some shareof searches among racially mismatched driver–officerpairs would not have occurred had the driver belongedto another racial group.

The above examples represent a mere fraction ofa decades-long, multidisciplinary effort to quantify thedegree to which police discriminate against civilians ofcolor [see Atiba Goff and Kahn (2012), Fridell (2017),and Ridgeway and MacDonald (2010) for more ex-tensive reviews of this empirical literature]. We high-light these specific examples because they all containseveral common features that are central toour critique.For one, these studies analyze data that fail to capturethe unseen selective process throughwhich police cometo engage civilians, a process that prior work shows isfunction of civilian race (Gelman, Fagan, and Kiss2007). In this way, these studies all fail to account for theimpact of race on the composition of the sample understudy. As we show below, failing to account for thisundocumented first stage of the police–civilian in-teractionwill lead to statistical bias, even if thegoal is toestimate the effect of suspect race within the sample ofindividuals who appear in police data and, in manycases, even with a “complete” set of control variablesthat render civilian race as-if randomly assigned topolice encounters.

Second, the aforementioned studies, despite makingat least implicitly causal claims, leave ambiguous theprecise quantity of interest—whether it be the averagetreatment effect (ATE) of race in all encounters; theaverage treatment effect among the subset of encoun-ters appearing in police data because a stop was made(ATEM51), which differs tremendously from the ATE;or the markedly more restrictive and difficult-to-in-terpret controlled direct effect among the same subset(CDEM51, defined below). While studies commonlydiscuss omitted variable bias and attendant assump-tions, they rarely discuss the additional assumptionsnecessary to identify specific causal quantities of in-terest. As a result, readers are unable to assess theadequacy of research designs and estimators, renderingthe interpretation and policy relevance of much priorwork unclear.

Taste-Based versus Statistical Discrimination

A closely related literature attempts to parse “taste-based discrimination” (racial animus) from so-called“statistical discrimination” (Arrow 1972, 1998;Becker 1971; Eberhardt et al. 2004; Phelps 1972) asmechanisms for racially biased policing, and insteadfocuses on recovering the causal effect of civilianrace on police behavior. In this study, we do not at-tempt to disentangle these mechanisms, and we notethat taste-based and statistical discrimination both

Dean Knox, Will Lowe, and Jonathan Mummolo

622

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 5: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

pose serious normative concerns. While statisticaldiscrimination is sometimes viewed as more in-nocuous, it nonetheless constitutes racial profilingbecause officers detain civilians due to the perceivedactions of their racial group, not their observed in-dividual behavior

CLARIFYING THE EFFECT OF CIVILIANRACE: NOTATION, ESTIMANDS,ASSUMPTIONS, ANDEXISTING APPROACHES

Researchers and policymakers examining the effects ofracially biased policing are nominally interested in therelationship between two variables: the race of the ci-vilian involved in encounter i, which we operationalizethrough theirminority statusDi2 {0, 1}, and consequentpolice behavior Yi 2 {0, 1}. However, analyses of ad-ministrative data on police–civilian encounters in-herently involve a mediating variable that may beaffected by race: whether an individual is stopped bypolice,whichwedenoteMi.The causalorderingof thesevariables is depicted in thedirectedacyclic graph (DAG)in Figure 1. We note that analysts often possess richcontextual informationabout theobjective contextof theencounter, suchas its locationand time,whichmay relateto all of the above. We denote these covariates collec-tively asXi. However, administrative data invariably failto capture unobservable subjective aspects of the en-counter, Ui, such as an officer’s suspicion or sense ofthreat.

As a motivating example, we consider the challengeof estimating racial bias in police violence as recentlyattempted in Fryer (2019). We ground our analysis inthe potential outcomes framework (Rubin 1974) oftenused in the study of causal mediation (Imai et al. 2011;Pearl 2001). The potential mediator Mi(d) representswhether encounter iwould have resulted in a stop if thecivilian were of race d. Similarly, the potential outcomeYi(d,m) representswhether forcewouldhavebeenusedin encounter i if the civilian were of race d and themediating variable werem. The observedmediator andoutcome can be written in terms of these potential

values asMi ¼Mi Dið Þ ¼�dMi dð Þ1 Di ¼ df g and Yi ¼Yi Di;Mi Dið Þð Þ¼�d�mYi d;mð Þ1 Di ¼ d;f Mi ¼mg,respectively. For any individual encounter, the (un-observable) causal effect of civilian race is thedifferencein potential force if the civilian were a minority andstopped as if they were a minority, versus if they werewhite and stopped accordingly, Yi(1, Mi(1)) 2 Yi(0,Mi(0)).

This notation implicitly makes the stable unit treat-ment value assumption (SUTVA, Rubin 1990). “Sta-bility” is of particular note: this stipulates that finerracial gradations must not affect the way that officersbehave, above and beyond any differences betweenthe broad binary categoriesDi5 0 andDi5 1. SUTVAalso requires that each encounter is unaffected bya civilian’s race in other encounters; this might be

violated if, for example, groups of individuals arestopped simultaneously.

Traditionally, analysts use data on stopped indi-viduals to study bias by computing the difference inviolence rates between stopped minority and whitecivilians, while controlling for observable differencesbetween these two sets of encounters. We term thisthe “naıve estimator,” D, and it can be written asfollows:

D ¼ YijDi ¼ 1;Mi ¼ 1� YijDi ¼ 0;Mi ¼ 1; (1)

where conditioning on possible treatment-outcomeconfounders, Xi, is left implicit. Assuming the analysthas correctly measured and specified all such con-founders, Dmay appear entirely reasonable at the firstglance. However, without further assumptions, thisquantity will have no causal interpretation so long asthe treatment affects the mediator (i.e., civilian raceaffects whether officers detain a civilian). As we showbelow, this is because treated encounters (with mi-nority civilians) that result in a stop (Mi51)will not becomparable to those with stopped control (majority)civilians. As a simple example, suppose officersexhibited racial bias as follows: they detain whitecivilians if they observe them committing a seriouscrime (such as assault, potentially warranting the useof force) but detain nonwhite civilians regardless ofobserved behavior. When this is true, comparingstopped white and nonwhite civilians amounts tocomparing fundamentally different groups. The an-alyst will observe force used against a greater pro-portion of stopped white civilians because of thedifferential physical threat they pose to officers.5

Under the traditional approach, the analyst wouldnaıvely conclude that antiwhite bias exists, yielding anerroneous portrait of racial discrimination in theuse offorce.

FIGURE 1. Directed Acyclic Graph of RacialDiscrimination in the Use of Force by Police

Notes: Observed X is left implicit; these covariates may becausally prior to any subset of D, M, and Y.

5 While somepolice records indicatewhethera suspectwasengaged inviolent behavior, allowing the analyst to control for this particularfactor, a host of similar concerns (e.g., time-varying officer suspicion)are unmeasured and thus cannot be controlled away.

Administrative Records Mask Racially Biased Policing

623

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 6: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

To formalize the limitations of the naıve estimator,we begin by partitioning the population into principalstrata with respect to the mediator (Frangakisand Rubin 2002; VanderWeele 2011). That is, weconceptualize police–civilian encounters in terms offour latent classes within which Mi(1) and Mi(0) areconstant. The general approach of principal stratifi-cation has proven useful for clarifying and boundingquantities of interest in areas ranging from in-strumental variables (Angrist, Imbens, and Rubin1996; Balke and Pearl 1997) to the closely related“truncation by death” problem (Rubin 2000; Zhangand Rubin 2003).

These principal strata include “always-stop”encounters in whichMi(0)5Mi(1)5 1, as well as stopsthat discriminate against racial minorities (“racialstops”) in whichMi(1)5 1 butMi(0) 5 0. Always-stopencounters may be conceptualized as relatively severescenarios, such as violent crimes in progress, in whichofficers have no choice but to intervene regardless ofcivilian race. In contrast, previous work has identifiedcertain behaviors, such as “furtive movements” (Gel-man,Fagan, andKiss 2007;Goel,Rao, andShroff 2016),that appear to be acted on selectively by officers basedon the raceof suspects.“Never-stop”encounters,whereMi(0) 5 Mi(1) 5 0, are situations in which civiliansappear inconspicuous and would not be stopped, re-gardless of race. There also may be antiwhite racialencounters, inwhichMi(1)5 0butMi(0)5 1, thoughwebelieve these to be rare to nonexistent (discussed fur-ther below). Figure 2 shows encounters appearing inpolice records (principal strata for which Mi(Di) 5 1)are not comparable across civilian races. Minoritypolice–civilian encounters that result in a stop area mixture of “always-stop” and “antiminority racialstop” encounters, while encounters with white civiliansthat result in a stop are a combination of “always-stop”and “antiwhite racial stop” encounters. These arefundamentally different groups, and without furtherassumptions, comparisons of rates of violence betweenthem using the naıve estimator will be statisticallybiased.

To state this more formally, note that the naıve es-timator recovers the weighted combination of violencerates in observed principal strata:

E Dh i

¼ E YijDi ¼ 1;Mi ¼ 1½ � � E YijDi ¼ 0;Mi ¼ 1½ �¼ E Yi 1; 1ð ÞjDi ¼ 1;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �Pr Mi 0ð Þ ¼ 1jDi ¼ 1;Mi 1ð Þ ¼ 1ð Þ

þE Yi 1; 1ð ÞjDi ¼ 1;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ �Pr Mi 0ð Þ ¼ 0jDi ¼ 1;Mi 1ð Þ ¼ 1ð Þ

�E Yi 0; 1ð ÞjDi ¼ 0;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �Pr Mi 1ð Þ ¼ 1jDi ¼ 0;Mi 0ð Þ ¼ 1ð Þ

�E Yi 0; 1ð ÞjDi ¼ 0;Mi 1ð Þ ¼ 0;Mi 0ð Þ ¼ 1½ �Pr Mi 1ð Þ ¼ 0jDi ¼ 0;Mi 0ð Þ ¼ 1ð Þ:(2)

In equation (2), the first term is the average rate offorceappliedduringencounterswith racialminorities ofthe always-stop stratum, while the second term dealswithminorities in the anti–minority racial-stop stratum.The third and fourth terms are the average violencerates amongwhite civilian encounters in the always-stopand antiwhite racial stop strata. Importantly, principal

strata are not fully observable without further assump-tions, and theyexist evenafter conditioningonXi: for anyparticularminority stop, it is fundamentally impossible toknowwith certainty whether a white civilian would havebeen stopped in identical circumstances. In sum, thenaıveestimator comparesgroupswithdifferent potentialoutcomes, and because these groups are unobservable,the resulting bias is difficult to address.

A central quantity of interest in the study of policingbias is the average treatment effect of race,ATE ¼ E Yi 1;Mi 1ð Þð Þ � Yi 0;Mi 0ð Þð Þ½ �—the extent towhich civilians of color face greater risk of police vio-lence thanwhite civiliansbecause of their race. TheATEconsiders both reported and unreported encounters,and it captures two related phenomena: first, whethermembers of theminority are differentially stopped; andsecond, if they are differentially subject to violence.However, police administrative records contain dataonly on reported encounters,meaning that this quantitycannot be estimated solely with police administrativedata without untenable assumptions. The ATE can berestated as follows:

ATE ¼ E Yi 1;Mi 1ð Þð Þ½ � � E Yi 0;Mi 0ð Þð Þ½ �¼�

d�m�m0

�E Yi 1;Mi 1ð Þð Þ � Yi 0;Mi 0ð Þð Þj½

Di ¼ d;Mi 1ð Þ ¼ m;Mi 0ð Þ ¼ m9�:

3Pr Di ¼ d;Mi 1ð Þ ¼ m;Mi 0ð Þ ¼ m9ð Þ�; (3)

where the second line illustrates how it sums overthe principal strata depicted in Figure 2, takinginto account the number of minority and white civiliansin each strata (the probabilities) and the local averagetreatment effects for each group (the expectations). InOnline Appendices A.1–A.4, we use these quantities toderive bias and nonparametric sharp bounds.

No data are available for “never-stop” encounters,those with Mi(1) 5 Mi(0) 5 0. Moreover, racial-stopencounters, with Mi(1) 5 1 and Mi(0) 5 0, are onlyrecorded for minority civilians. However, consistentwith Nyhan, Skovron, and Titiunik (2017), we show inOnline Appendix A.6 that the ATE can be pointidentified if researchers collected two additionalnumbers: the count of total minority and whiteencounters, within levels of covariates X whereapplicable—a point we discuss further in our recom-mendations for future research.6

6 Nyhan, Skovron, and Titiunik (2017) examines the problem ofstudying the effect of party identification on turnout using voterregistration files, given the fact that party ID likely affects who reg-isters tovote. Inanapproachthat isequivalent toourProposition2, thestudy uses registration rates of the treated and control voting-agepopulations to bound the ATE given the effect of party ID on reg-istration. This option is not available in practice here since no data setscontain information on unreported encounter rates, or even theirorder ofmagnitude.As a result, analysts in this literature focus almostexclusively on the ATEM51, which we examine with a different ap-proach here. While some work in policing uses population figures asproxies for these encounter rates, we are skeptical of this approach, aspolice frequently stop civilians who reside in other jurisdictions.

Dean Knox, Will Lowe, and Jonathan Mummolo

624

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 7: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Because “never-stop” encounters are unobserved incurrent data sources, researchers seeking to understandthe role of race in police behavior have, at least im-plicitly, focused on more narrowly defined estimands.7

Studies commonly restrict analysis to the subset ofreported encounters, that is, they seek to estimateeffects among those stopped by police, ATEM51. Incontrast to the ATE, this estimand is by definition notconcerned with unreported white encounters thatwould have escalated to a stop if the involved civilianwas a minority. (The same is true for unreported blackencounters that would have escalated if the involvedcivilian was white, to the extent that this group exists.)Formally, this quantity is given by the followingequation:

ATEM¼1 ¼ E Yi 1;Mi 1ð Þð ÞjMi ¼ 1½ � � E Yi 0;Mi 0ð Þð ÞjMi ¼ 1½ �:(4)

Relatedly, analysts may seek to causally attribute thenumber ofminority stops inwhich force would not havebeen used if the individual in question had been white(Yamamoto 2012). This value is proportional to theconditional average treatment effect among the treated(i.e., minority) stops, which can be written as follows:

ATTM¼1 ¼ E Yi 1;Mi 1ð Þð ÞjDi ¼ 1;Mi ¼ 1½ �

�E Yi 0;Mi 0ð Þð ÞjDi ¼ 1;Mi ¼ 1½ �: (5)

While the average treatment effects are of obviouspolicy importance, they are not the only quantity thatresearchers might seek to estimate. A closely relatedestimand is the controlleddirect effect among the subsetof reported encounters, CDEM¼1 ¼ E Yi 1; 1ð ÞjMi ¼ 1½ ��E Yi 0; 1ð ÞjMi ¼ 1½ �. This estimand differs from theATEM51 in its conceptual approach to racially dis-criminatory stops. Where the ATEM51 asks whethera stopwouldhaveoccurredatall if the individualwereofdiffering race, the CDEM51 seeks to quantify whatwould have happened if the officer was forced to stopthem anyway, perhaps against the officer’s will. Inpractice, the difference is one of inter-pretation—regardless of the target quantity, existingwork in this domain is based on the naıve difference inreported outcomes, and the question lies in the in-terpretation of estimated results. We note that causalestimands in the literature are often left undefined,making it difficult to assess whether published resultsare intended to correspond to theATEM51 or CDEM51(e.g., Goel, Rao, and Shroff 2016; Simoiu, Corbett-Davies, and Goel 2017). In Online Appendix A.3, wediscuss theCDEM51 at length.Weshowthat it cannotberecovered in this setting unless analysts make the un-tenable assumption that no mediator-outcome con-founding exists (Assumption 5, below). We referreaders to the Online Appendix for further details andfocus on recovery of average treatment effects here.

Necessary Assumptions

In this subsection, we describe a number of statisticalassumptions that the analyst must make for a causalstudy of racially biased policing when only adminis-trative data on police–civilian interactions is available.Without these assumptions, causal quantities of interestin this substantive area cannot be identified in data.

Assumption 1 (Mandatory Reporting). Yi(d, 0) 5 0for all i and for d 2 {0, 1}.

We assume all encounters that escalate to the use offorce also trigger a reporting requirement and are,therefore, observed in administrative data. Thoughthere exist wide variability in data recording practicesacross jurisdictions, this assumption is plausible in thestudy ofmanymajor police departments. For example,New York Police Department (NYPD) officers arerequired to report a number of variables, including thespecific type of force used, following each “stop,question, and frisk” encounter. Based on these andother reports, the NYPD releases detailed annual use-of-force reports (NYPD 2017). The completeness of

FIGURE 2. Principal Strata and ObservedPolice–Civilian Encounters

Notes: The figure displays the four principal strata that comprisepolice–civilianencountersbasedonhow themediatorM (whethera civilian is stopped by police) responds to treatment D (whetherthe civilian is a racial minority). Minorities in the “always stop” andanti–minority racial stop strata, highlighted in red, are stopped bypolice and, thus, appear in police administrative data. Likewise,white civilians in the “always–stop” and anti–white racial stopstrata, highlighted in blue, appear in police data. “Never stop”encounters are unobserved. Because white and nonwhiteencounters are drawn from different principal strata, the twogroupsare incomparableandestimatesof causal quantitiesusingobserved encounters will be statistically biased absent additionalassumptions.

7 For example, Fryer (2018) notes that his analysis of police use offorce is estimating the effect of suspect race “conditional on an in-teraction,” with police (4), rather than seeking its average treatmenteffect in the population.

Administrative Records Mask Racially Biased Policing

625

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 8: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

these reports with respect to fatalities is informallyenforced by standard journalistic practiceswhich placehigh emphasis on documenting violent incidents(Iyengar 1994). Lesser formsof force aremore likely togo unreported, to be sure, but the ubiquity of sur-veillance cameras, cell phone cameras, and media in-terest in police brutality makes unobserved uses offorce increasingly unlikely (Fisher and Hermann2015). We note that this assumption is implicit in allanalyses of police use of force that rely on adminis-trative data.

Assumption 2 (Mediator Monotonicity). Mi(1) $Mi(0) for all i.

This assumption allows that there may be encoun-ters in which minorities would be stopped (Mi(1)5 1)but whites would not (Mi(0) 5 0), perhaps becauseofficers racially discriminate in applying differentialthresholds of “reasonable suspicion.” However, weassume that the reverse is never true: white civiliansare never stopped in circumstances when their mi-nority counterparts would be allowed to pass. Thisis clearly a stylized representation of a complexreality, and it would be violated if minority officersdiscriminate against white civilians. A violation couldalso occur if white civilians were more likely to bestopped by police because they appeared out ofplace in a predominantly black neighborhood, per-haps under the assumption that they were there tobuy drugs (Gelman, Fagan, and Kiss 2007, 822).These are rare occurrences, and a robustness checkin Online Appendix B.3, our reanalysis of Fryer(2019) after dropping all stops based on suspicionof a drug transaction, shows substantively similarresults.

Assumption 3 (Relative Nonseverity of RacialStops). E Yi d;mð ÞjDi ¼ d0;Mi 1ð Þ ¼ 1;Mi 0ð Þ½ ¼ 1;Xi ¼x�$E Yi d;mð ÞjDi½ ¼ d9;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0;Xi ¼ x�.

We theorize that for encounters during criminalevents severe enough to warrant stopping a civilianregardless of race (i.e., “severe” or “always-stop”encounters), the use of force is as ormore likely to occurthan during encounters in which police have morediscretion over whether to stop an individual (i.e., thosein which racial discrimination in stopping can occur) inexpectation. We regard this assumption, which com-pares violence rates within encounters that hold civilianrace fixed, as highly plausible. As one hypotheticalexample, this assumptionwould imply that police are asor more likely to use force against a white civilian ob-servedcommittingassault thanawhite civilianobservedjaywalking, on average.

Assumption 4 (Treatment Ignorability).

(a) With respect to potential mediator Mi(d) v Di|Xi.(b) With respect topotential outcomes:Yi(d,m)vDi|Mi(0)

5 m9, Mi(1) 5 m0, Xi.

This states that conditional onXi, civilian race is “asgood as randomly assigned” to encounters, and

officers encounter minority civilians in circumstancesthat are objectively no different from white encoun-ters. Part 4(a) stipulates that theobserved covariatesXinclude the confounder W in Figure 3(a). This as-sumption, while strong, has become more plausible inrecent years as administrative data sets have come toinclude a host of encounter attributes that mightlargely capture features observable to police whichcorrelate with suspect race and the potential for force.However, we note that this cannot be tested, evenindirectly, without data on nonstopped individuals.This assumption would be violated if neighborhoodswith high shares of minority residents were moreheavily policed and the analyst failed to adjust forneighborhood, for example, using fixed effects. Part4(b) implies that, for example, if police were moreheavily armed during minority-neighborhood patrolsand, hence, more likely to deploy force—representedby V in Figure 3(b)—then V must be included in X.Without Assumption 4, the range of possible racialeffects is so wide as to be uninformative. We also notethat every study claiming to estimate racial discrimi-nation using similar data makes this assumption, oftenimplicitly. Our aim in this study is not to assert theplausibility of treatment ignorability, but rather toclarify that deep problems remain even if this well-known issue is somehow solved.

Strong Assumptions

We now discuss further assumptions that are often leftimplicit in empirical studies of racially biased policingand that are implausible in many settings. We illustratethese scenarios graphically in Figure 3.

Assumption 5 (Mediator ignorability). Yi(d, m) vMi(0)|Di 5 d, Mi(1) 5 1, Xi.

This is related to but dramatically stronger thanAssumption 3, which merely requires that always-stopencounters are at least as severe in terms of observedcriminal behavior. In contrast, forAssumption5 tohold,violence rates in always-stop encounters must beidentical to those in observationally equivalent racialstops. We find mediator ignorability to be highly im-plausible in the context of policing. Subjective factorssuch as an officer’s suspicion and sense of threat-—depicted as U in Figure 3(c)—can not only lead toinvestigation (stopping) but also a heightened willing-ness touse force.Thesemediator-outcomeconfoundersmust be captured in X for this assumption to hold, butthey are notoriously difficult to capture in officers’ self-reported accounts. Even when proxies based on qual-itative officer narratives are available, strong legalincentives exist for distortion. Moreover, analysts mustbe sure to condition on all variables related to officermindset that are causallyupstreamof stops,while takingcare not to induce bias by conditioning on any that aredownstream.

Below,wedemonstrate that everyanalysis estimatinga racial effect using only data on stopped individualsimplicitly makes Assumption 5. We further note thatAssumptions 4(a), 4(b), and 5 are jointly covered by the

Dean Knox, Will Lowe, and Jonathan Mummolo

626

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 9: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

slightly stronger assumption of sequential ignorability(Imai et al. 2011).

Assumption6 (NoRacialStops).Mi(0)5Mi(1)|Mi51.

InFigure 3, this amounts to assuming away the arrowbetweenD andM. Equivalently, this assumption statesthat all reported encounters were of the always-stopkind, or that there is no racial discrimination in stops.Weshowbelowthat thisassumption is implicitlymadebyall studies claiming to identify the average treatmenteffect of race, conditional on a reported interaction.Naturally,when there is novariation inMi(0), then thisvariable is ignorable and Assumption 5 is alsosatisfied.

However, in view of an overwhelming body ofqualitative evidence and consistently massive quanti-tative differences in racial detainment rates acrossnumerous policing domains, we find racial bias inpolice stops too plausible to dismiss by assumption(Alexander 2010; Baumgartner et al. 2017; Glaser2014;Goel, Rao, and Shroff 2016; Lerman andWeaver2014). A raft of studies have also found that racialdisparities persist even after leading candidate omittedvariables, such as differential criminal activity acrossracial groups, are accounted for (Gelman, Fagan, andKiss 2007). While such patterns are not proof ofa causal relationship, we consider the possibility thatpolice exhibit anti–minority bias when engagingcivilians strong enough tomerit a careful considerationof the implications of that bias for the validity of studiesof racially biased policing.

Bias in the Naıve Estimator

In this section, we clear up several misunderstandingsabout the conventional estimator, which comparesreportedminority stops to reportedwhite stops (with orwithout covariates). First, we show that when there isany racial discrimination in detainment, selection onstops introduces unavoidable statistical bias in esti-mating the ATEM51, even when a perfect set of ob-served covariates renders race ignorablewith respect tothe potential mediator and outcomes. These resultsdirectly contradict prior assertions that “linear re-gression can recover the ‘race effect’ if race is ‘as goodasrandomly assigned,’ conditional on the covariates”(Fryer 2018, 2).The issue isnotoneofomittedvariables,but rather posttreatment conditioning. Second, we

clarify an important open question about the nature ofthis bias. Fryer (2018) comments in the context of se-lection into arrest data that, “It is unclear how to esti-mate the extent of such bias or how to address itstatistically” (5). Here, we derive the exact form of thisbias for the ATEM51 and the ATTM51; Online Ap-pendix A.3 does the same for the CDEM51. We showthat the bias is always negative, resulting in naıveestimates that downplay the extent of racially dis-criminatory police violence. Below, we develop in-formative nonparametric sharp bounds that adjust thenaıve estimates for the range of all possible selectionbias.

Prior work on race and policing uses estimators thatcompare average reported outcomes in majorityencounters to those in minority encounters. For sim-plicity of exposition, we present the special no-cova-riate case; Appendices A.1–A.3 derive the bias of thenaıve estimator with covariate adjustment. We firstrefer readers to equation (1),whichexpresses thenaıveestimator, D, in terms of stratum mean potential out-comes. We demonstrate that this commonly used an-alytic approach fails to recover any quantity of interestunder plausible assumptions. We first show that it isbiased for the ATEM51 and ATTM51 unless As-sumption 6 is true, and there are no racial stops. InOnline Appendix A.3, we show it is also biased for theCDEM51 unless Assumption 5 holds—that is, always-stop encounters are identical in violence rates to ra-cially discriminatory stops. As a result, the observeddifference in means fails to recover any known causalquantity without additional, and highly implausible,assumptions.

In Online Appendix A.1, we derive the bias of D whenit is used to estimate ATEM51 under the relativelyplausible Assumptions 1–4. This bias can be written asfollows:

E Dh i

�ATEM¼1

¼ E Yi 1; 1ð Þ � Yi 0; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �ð�E Yi 1; 1ð Þ � Yi 0; 0ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ �Þ3Pr Mi 0ð Þ ¼ 0jDi ¼ 1;Mi ¼ 1ð ÞPr Di ¼ 1jMi ¼ 1ð Þ� E Yi 1; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �ð�E Yi 1; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ �Þ3Pr Mi 0ð Þ ¼ 0jDi ¼ 1;Mi ¼ 1ð Þ: (6)

FIGURE 3. Violations of Assumptions

Notes:DAGs (a), (b), and (c), respectively, illustrate theviolationofAssumptions4(a), 4(b), and5.Note that thevariableUdepicted inDAG(c)is almost certain to exist in the policing context, and we do not advocate the use of Assumption 5.

Administrative Records Mask Racially Biased Policing

627

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 10: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Weoffer several comments on equation (6). The biasterm isguaranteed tobenegative, evenwithaperfect setof controls that render Di ignorable, as long as thereexist any racially discriminatory stops of minority civil-ians (or in an empirically falsified edge case).8 The firstterm in thebias expression relates toheterogeneity in theaverage treatment effect, or the extent to which Yi(1,Mi(1)) 2 Yi(0, Mi(0)) differs in expectation betweenalways-stop and racial-stop encounters—respectively,those withMi(1) 5 Mi(0) 5 1 and Mi(0) , Mi(1).

9 Biasarises because in the latter type of encounter, a whitecivilianwouldneverhavebeendetained in thefirst place,and hence force would never have been used—that is,E Yi 0; 0ð ÞjDi ¼ 1;Mi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 0½ � ¼ 0. Estimat-ing the average potential outcomes of this group usingstopped white civilians introduces unavoidable bias thatthe analyst cannot hope to eliminate simply by addingadditional covariates to the estimating model. The sec-ond term is related to the difference in baseline violencerates between always-stop encounters and racially dis-criminatory stops; this term also vanishes if there are noracial stops.

Can the naıve estimator be rehabilitated by simplyredefining the quantity of interest? In Online Ap-pendices A.2–A.3, we show that the answer is no. Thestructure of the bias when D is used to estimate theATTM51 is simpler but leads to substantively identicalconclusions: the naıve estimator is biased unless thereare no racial stops. We show that bias for the ATTM51is given by E D

h i� ATTM¼1 ¼ �E Yi 0; 1ð ÞjMi 1ð Þ ¼ 1;½

Mi 0ð Þ ¼ 1� Pr Mi 0ð Þ ¼ 0jMi 1ð Þ ¼ 1ð Þ. While the iden-tifying assumptions for the CDEM51 are slightlyweaker, they are nonetheless wholly implausible. Thesign of this bias for the ATTM51 and CDEM51 can alsobe shown to be negative underAssumption 1–4, exceptin the implausible edge cases described in the OnlineAppendix. Thus, regardless of the target quantity, theuse of the observed difference inmeanswill understatethe rate of racially discriminatory police violence. Inaddition, we emphasize that these derivations showthat statistical bias remains even after assuminga “complete” set of control variables that renders raceignorable. Posttreatment conditioning induces biasunless additional assumptions hold.

POTENTIAL SOLUTIONS

Howshould the analyst proceed in light of these results?We propose two approaches that eliminate the highlyimplausible assumptions outlined in the “StrongAssumptions” section, which are unstated but implicit

in prior work. We caution that these solutions still relyon theweaker assumptions described in the “NecessaryAssumptions” section, althoughwe argue that these areoften plausible in light of insights from extensive re-search on policing. Reasonable people can disagree onthe plausibility of various assumptions, but by statingthemexplicitly,we seek to advance empiricalwork in anarea which, at present, largely ignores such issuesaltogether.

In thefirst approach, we derive nonparametric sharpbounds representing the tightest possible range ofcausal effects that are consistentwith the reported data(Manski 1995). Again, for simplicity, we begin bypresenting bounds for the case in which treatment isunconditionally ignorable. To incorporate covariates,Online Appendix A.4 then describes a more generalformulation in which bounds are computed withinlevels of X, without functional form assumptions, andreaggregated; this latter formulation is also applicablewhen a correctly specified regression is used. Bothcases are demonstrated in a reanalysis of Fryer (2019)below.

A key limitation of the first proposed solution is thatall quantities of interest remain only partially identified.This is fundamentally a consequence of selection intopolice administrative records; point identification sim-ply cannot be achieved without either implausibleassumptions or additional data. To this end, we outlinean alternative approach that incorporates limited in-formation about the missing encounters (those that donot result in a stop). We show that with additionaldata—which in some cases are already being collectedby agencies—the prevalence of racially discriminatorystops and most racial effects of interest can be pointidentified. Following our applied example, we describea feasible research design based on this approach indetail.

Bounds on Effect of Race

Here, we derive large-sample nonparametric sharpbounds on the ATEM51 and ATTM51, focusing first onthe case inwhichAssumption 4 (treatment ignorability)holds without conditioning on further covariates.Proposition 1 quantifies and corrects for the range ofpossible bias induced by posttreatment conditioning,producing an informative interval of possible jointvalues for (1) the partially identified ATEM51 and (2)the proportion of racial stops among reported minorityencounters, r 5 Pr(Mi(0) 5 0|Di 5 1, Mi 5 1). Asequation (6) suggests, when there is no racial bias inpolice stops (r 5 0), these bounds collapse on the ob-served difference in means. We further demonstrate inFigure 4 that these bounds are highly informative whenr is known or can be credibly estimated from supple-mental data. When the prevalence of racially discrim-inatory detainment is unknown but a plausible rangecan be inferred from prior work, Figure 4 (discussedbelow) illustrates how this value can be used to assessthe behavior of the bounds much like a sensitivityparameter.

8 The edge case is if there is zero use of force against white civilians.This possibility is empirically falsifiable; in our application, we showthat it is far from the truth.To see that the bias is negative, observe thatE Yi 0; 1ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �$E Yi 0; 0ð ÞjMi 1ð Þ ¼ 1;Mi 0ð Þ ¼ 1½ �,because the latter term is zero under Assumption 1. Together withAssumption 3, this signs the bias.9 Note that Mi(d) simplifies in equation (6), because it is constantwithin principal strata.

Dean Knox, Will Lowe, and Jonathan Mummolo

628

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 11: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Proposition 1 (Nonparametric Sharp Bounds onATEM51). When Di is ignorable, nonparametric sharpbounds on (ATEM51, r) under Assumptions 1–4 arejointly given by

E Dh i

þ rE YijDi ¼ 0;Mi ¼ 1½ � 1� Pr Di ¼ 0jMi ¼ 1ð Þð Þ#ATEM¼1 #

E Dh i

þ r

1� rE YijDi ¼ 1;Mi ¼ 1½ � �max 0; 1þ 1

rE Yi jDi ¼ 1;Mi ¼ 1½ � � 1

r

�� ��

3Pr Di ¼ 0jMi ¼ 1ð Þ þ rE Yi jDi ¼ 0;Mi ¼ 1½ � 1� Pr Di ¼ 0jMi ¼ 1ð Þð Þ:

where D ¼ YijDi ¼ 1;Mi ¼ 1� YijDi ¼ 0;Mi ¼ 1 and

the (ATTM51, r) must similarly satisfy

ATTM¼1 ¼ E Dh i

þ rE YijDi ¼ 0;Mi ¼ 1½ �To derive Proposition 1, we reformulate the bias in

terms of the unobserved joint distribution of (1) the useof force in minority encounters and (2) whether a mi-nority stop was racially discriminatory. FollowingKnoxet al. (2019), we then use Assumptions 1–4 and theFrechet inequalities, in conjunction with the observedmargins, to place sharp bounds on this joint distribution.These then imply sharp bounds on the ATEM51. Adetailed proof is given in Online Appendix A.4 for themore general case in which Di is ignorable only after

conditioning on prestop covariates. In this case, thelocal average treatment effect, ATEM51,x, is firstbounded by applying Proposition 1 within levels ofX toobtain local bounds, ATEM¼1;x;ATEM¼1;x

� . These are

then straightforwardly reaggregated to obtain boundson the conditional treatment effect among stops,

�xATEM¼1;x Pr Xi ¼ xjMi ¼ 1ð Þh

, �xATEM¼1;x Pr

Xi ¼ xjMi ¼ 1ð Þ�. In Online Appendix A1.5, we outlinea Monte Carlo procedure for constructing confidenceintervals that asymptotically contain both the true lowerand upper bounds endpoints with probability 1 2 a.

Wenote that theproportionof raciallydiscriminatorystops may vary with X. However, when using thesebounds as a sensitivity analysis, we suggest using thesimplifying approximation of a constant r. This is be-cause without additional data beyond civilian race, theuse of force, or even prestop covariates, police ad-ministrative records alone are virtually uninformativeabout the range of r: any value in [0, 1) could producethe observed data,10 although Proposition 1 shows that

FIGURE 4. Bounds for Racially Discriminatory Use of Force, any Severity

Notes:These plots present the ATEM51 (ATTM51) for excess racial force, scaled by the number of stops (number ofminority stops) to obtainthe total number of civilians affected. The left panels consider the difference in the use of force if black civilians were substituted into eachencounter of any race (each black encounter), versus white civilians; the right panels show the same quantities for Hispanic civilians. Bluepoints (error bars) denote the naıve estimator (95% confidence intervals), which, conditional on the typical selection-on-observablesassumption, is unbiased for the ATEM51 if there are no discriminatory stops of minority civilians (zero on the x-axis). The dark (light) regionsrepresent the range of possible values (95%CI) for (1) the ATEM51 and (2) the proportion of discriminatory stops in reported data jointly, perProposition 1. The vertical line corresponds to an estimate of the proportion of discriminatory stops from Gelman, Fagan, and Kiss (2007),suggestingaplausiblevalue for thisunobservableparameter.The top (bottom)panelspresentboundsbasedonamodelwithnocontrols (themain specification, adjusting for a wide range of covariates).

10 If all stops were racially discriminatory, then we would observe nowhite stops.

Administrative Records Mask Racially Biased Policing

629

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 12: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

each possible r value has differing implications for theset of possible racial effects.

Point Identification of the ATE GivenAdditional Data

The ATE is point identified with the collection of onlytwo additional numbers—the count of totalminority andwhite encounters, within levels of X where applicable.Below, we propose an alternative design in which thesedataare collected frompassive instruments suchas trafficcameras or police body-worn cameras. Where sucha design is infeasible (e.g., where traffic cameras coveronly a subset of the jurisdiction under study), pointidentification can also be achieved by linking incompletedata on both reported and unreported encounters topolice administrative records under mild assumptions.

Proposition 2 (Point Identification of ATE). UnderAssumptions 1–4, the ATE is identified by a weightedcombination of the observed racial means,

E YijDi ¼ 1;Mi Dið Þ ¼ 1½ �Pr Mi ¼ 1jDi ¼ 1ð Þ

�E YijDi ¼ 0;Mi Dið Þ ¼ 1½ �Pr Mi ¼ 1jDi ¼ 0ð Þ:Intuitively, the proof breaks the ATE into the size-

weighted sum of principal effects among always-stopand racial-stopencounters (theprincipal effect innever-stop encounters is known to be zero). Crucially, theadditional data on nonstops allows the researcher toconstruct a contingency table representing the jointdistributionof race anddetainment.Aspart of theproofin Online Appendix A.6, we show that this can be usedto straightforwardly recover the size of each principalstratum under Assumptions 2 and 4(a). However, itremains impossible to determine whether any in-dividual stop was racially discriminatory.

When total encounter numbers are unknown, thisjoint distribution can nonetheless be estimated byattempting to link a representative sample of allencounters (e.g., using timestamps from traffic cameras)against administrative records (e.g., license plate data-bases); those that are unlinkable can be presumed un-reported. After recovering principal strata sizes, we thenproceed by noting that minority outcomes in reportedadministrative data are in fact a mixture of Yi(1, Mi(1))from both always-stop and racial-stop strata in preciselythe required proportions; that reported white outcomescorrespond toYi(0,Mi(0)) from the always-stop stratum;and that Yi(0, Mi(0)) is known to be zero among theracial-stop stratum under Assumption 1. From this, theATE can then be reconstructed.

REANALYSIS OF FRYER (2019)

We have shown that the standard approach to esti-mating racial bias in police data will always un-derestimate its degree, so long as police discriminateagainst minorities when choosing whom to investigate.To explore the magnitude of this statistical bias in anapplied setting, we replicate and extend a section ofFryer (2019) which reports estimates of racial

discrimination in theapplicationof sublethal forceusingthe NYPD’s “Stop, Question and Frisk” (SQF) data-base (2003–13).11 The NYPD data contain roughly 5million records of pedestrian stops, the vast majority ofwhich are of nonwhite suspects. The data record the useof varying levels of force, including laying hands ona suspect, handcuffing a suspect, pointing a weapon ata suspect, and pepper spraying a suspect, among others.The original analysis in Fryer (2019) utilized the simplenaıve approach of equation (1) to predict the severity offorce applied by police, as well as covariate-adjustednaıve models analogous to those we consider in Ap-pendices A.1–A.3. Specifically, the study presenteda logistic regression of police force on suspect race,along with additional specifications that added a host ofcontrol variables such as precinctfixed effects, to renderthe ignorability assumptions more plausible. We re-produce two of these models—the baseline specifica-tion including only racial group indicators, along withthe richer “main specification” (21)12—to estimate theconditional expectations in Proposition 1. For compa-rability to the original analysis, we take these models atface value, setting aside issues of potential model mis-specification and the ignorability of civilian race.

Oneanalysis inFryer (2019) considered theuseof anyforce against a suspect, while subsequent analyses ex-amined force exceeding various severity thresholds,such as a binary outcome for “at least use of handcuffs.”Using the coding rules and estimation procedures inFryer (2019), we were able to closely replicate thepublished results. However, in doing so, we discoveredthis procedure involved an unconventional and in-advisable step in which all observations with nonzeroforce below the threshold of interest were dropped—asevere case of selection on the dependent variable. Inthe most extreme case, in the analysis of police batonand pepper spray use, this resulted in the discarding ofall encounters in which only lower levels of force wereused, a set that comprised 21.5%of all observations and99.8% of all uses of force. To present the most de-fensible results possible, for these outcomes, we departfrom the analysis in Fryer (2019) and revise the pro-cedure so that all encounters with a level of force at oraboveagiven thresholdareassignedanoutcomeof1 (asbefore) and all other encounters are assigned a value of0 (including those with lower levels of force, which arenow retained). Section B.1 in the Online Appendixcontains an extended discussion of the issue;

11 Because the replicationmaterial for Fryer (2019) was not posted atthe time of analysis, these data were obtained directly from https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page.12 The main specification in Fryer (2019) consists of a logistic re-gression of a force outcomeon race dummies plus controls for gender,a quadratic in age, whether the stopwas indoors or outdoors, whetherthe stop took place during the daytime, whether the stop took place ina high crime area, during a high crime time, or in a high crime area ata high crime time, whether the officer was in uniform, civilian ID type,whether others were stopped during the interaction, controls for ci-vilian behavior, and precinct and year fixed effects. Fryer (2019) alsonotes that “missing indicators for all variables” are included ascovariates. We omit these indicators as it was unclear how they werecoded. See Figure 1 caption in Fryer (2019).

Dean Knox, Will Lowe, and Jonathan Mummolo

630

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 13: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

a comparison of the original, replicated, and correctedresults; and a demonstration of the serious implicationsfor statistical significance of the original estimates.

Basedon thediscussion inbothFryer (2018)andFryer(2019), we interpret the published results as estimates ofthe ATEM51: “the difference inY that can be attributedto an individual’s race,” (Fryer 2018, 2), conditional ona recorded interactionwith police (i.e., conditional onMi5 1). We note that of the other quantities considered inthis study, the unconditional ATE cannot be estimatedwithout information on unreported encounters, and theCDEM51 cannot be computed without strong assump-tions aboutpotential outcomes that canneverbe realizedin observational settings. For these reasons, we focus onthe ATEM51 and ATTM51 in this reanalysis.13

Figure 4 depicts bounds on the ATEM51 when thebinary outcome is any use of force, including the lowestrecorded value of physically handling a civilian.14 Im-portantly, this specific outcome is unaffected by theoutcome coding issue discussed above. (In Figures B.2and B.3, we present additional bounds for varying forcethresholds, up to whether a baton or pepper spray wasused.) The plots also display estimates of the bias-cor-rected ATTM51 (dashed lines). As the plots show, therange of possible ATEM51 and ATTM51 values variesstrongly with the severity of discrimination in stops.

In equation (6), we demonstrated that the use of thenaıve estimator implied the substantively implausibleassumption that police never discriminate in stops (i.e.,r 5 0). Similarly, contextual information also suggeststhat some depicted values of r are implausibly large. Tounderstand the rangeof empirically plausible values,weturn to two prior studies that use very different analyticapproaches to shed light on the degree of racial bias inthe decision to detain civilians. Using the SQF data andcontrolling for precinct, suspected crime, andprior localarrest rates by race, Gelman, Fagan, and Kiss (2007)produce estimates that—by our calculations—imply32% of black-civilian stops made by the NYPD couldnot be explained even by differential criminality

between racial groups of suspects, as proxied by priorarrest rates.15 Their analyses are run separately byprecinct and crime type; for simplicity, we take theweighted average of racial-stop proportions. This an-alytic approach most likely underestimates the pro-portion of racially discriminatory stops—the number ofprior arrests in a precinct and racial group is not a directmeasure of criminality, but is itself likely contaminatedby discrimination in previous detainments and arrests.We, therefore, regard the value of r implied byGelman,Fagan, and Kiss (2007) as conservative.

Goel, Rao, and Shroff (2016) take an entirely dif-ferent tack based on a comparison of “hit rates,” or theshare of stops that produced evidence of the suspectedcrime for which the civilian was detained—a variant ofan “outcome test” for discrimination (Anwar and Fang2006; Knowles, Perisco, and Todd 2001). Using a flexi-ble logistic regression to adjust for a vast array ofindicators visible to officers prestop, the study showsthat white hit rates exceeded those of “similarly situ-ated” black civilians. We show in our Online AppendixA.7 that the difference in hit rates implies a minimumproportion of racial stops and, therefore, also impliesa conservative estimate of r.16 The correspondingvalues of r from these two studies are 0.32 and a lowerbound of 0.34, respectively, when considering blackcivilians.While any estimate of this difficult-to-measurequantity from police data is sure to be imperfect, thefact that two independent estimates of racial bias instopping so closely comport with one another, despiteusing wholly different analytical approaches, gives ussome empirical justification for narrowing the range ofplausible racial effects in the use-of-force analysis. Wenote that the research design presented in the “Rec-ommendations for Future Research” section belowoffers an alternative approach for obtaining betterestimates of racially discriminatory stopping.

Figure 4 demonstrates that strong negative bias in thenaıve estimator paints a wildly misleading portrait ofpolice use of force. We turn first to estimates of theATEM51 using themain specification, which adjusts fora battery of covariates. The naıve estimator (whichassumes no racial bias in police stops) suggests thatencounterswithblack (Hispanic) suspects arepredictedto exhibit an additional 3.9 (0.4) instances of

13 We note that in Proposition 1 we consider binary minority status,whereas the specifications in Fryer (2019) take civilian race as a cat-egorical variable. (However, only two races are considered for anyparticular ATEM51 estimate: black versus white, or Hispanic versuswhite). To accommodate this, in reported black ATEM51 andATTM51 results, we use a slight generalization in which white civilianencounters are representedwithDi5 0, black encounterswithDi5 1,and subsequentminority groupswithDi52, 3 and soon.Proposition1and its covariate-adjusted counterpart inOnlineAppendixAcan thenbe applied directly. The chief implication of this formulation is (1)a different average value for Yi(d, 1) is estimated for each minoritygroup, and (2) that all minority groups are implicitly assumed to beracially stopped at the same rate, although this can easily be relaxed.(The same procedure is applied when theminority group of interest isHispanic civilians, after setting the Hispanic indicator to Di 5 1.) Toassess whether results were affected by this, in Online Appendix B.4,we conduct two additional analyses after first subsetting to black andwhite encounters, and Hispanic and white encounters, respectively.As the results makes clear, conclusions are virtually identical apartfrom differences that stem from the size of the subsetted data.14 Note that we treat stops in which “other”was denoted as the use offorce category as zero force, since the vast majority of these cases didnot even not involve officers even laying hands on suspects.

15 Based on SQF data from 1998–99, Gelman, Fagan, and Kiss (2007)fit hierarchical Poisson models for the number of stops (by suspectedcrime, precinct, and race) per arrest in the previous year, which theymodel as emþarace within groups of stops defined by the suspectedcharges (violent crimes, weapons crimes, property crimes, and drugcrimes) and precinct racial composition (,10%, 10–40%, and.40%black).Within each group, the excess black stopping rate is then givenby 1� eawhite�ablack . We approximate the size of each group by multi-plying the reported marginal probabilities of stop types (25%, 44%,20%, and 11%, respectively) and composition groups (“each… rep-resents roughly 1/3 of the precincts”), since the joint distribution is notreported. The r 5 0.32 estimate is then produced by taking the size-weighted average of subgroup excess black stopping rates. The cor-responding estimate of r for Hispanic civilians implied by Gelman,Fagan, and Kiss (2007) is slightly higher, at 0.35.16 Using SQF data from 2008–12, Goel, Rao, and Shroff (2016) es-timate ahit rate of 3.8%forwhite suspects and2.5%forblack suspects(379), which implies that r is at least 0.34.

Administrative Records Mask Racially Biased Policing

631

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 14: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

handcuffing per 1,000 encounters, compared with thesame encounters had they involved white civilians. Wethen employ the most conservative racial stopping es-timate, denoted by the vertical line in the figure, togenerate bounds on the true race effect. Our bias-cor-rected results show the true effect is at least as high as15.5 (13.0)—meaning that the conventional approachunderestimates discriminatory force by a factor of atleast 4 (32).

To characterize bias in estimates of the ATTM51, weagain use the conservative racial stopping estimate fromGelman, Fagan, and Kiss (2007) to correct the naıveestimate. Again, the naıve approach substantiallyunderstates racially discriminatory police violence,suggesting that there were 75,000 instances in whichpolice laid hands on black and Hispanic civilians, butwould not have done so had those individuals beenwhite. Our bias-corrected estimate shows the truenumber is approximately 307,000, meaning the naıveapproach masks 232,000 such incidents. Similarly, thenaıve approach indicates roughly 3,400 racially dis-criminatory instances in which officers pointeda weapon at a black or Hispanic civilian, whereas thebias-corrected ATTM51 shows the true number is al-most five times as large.

To see how this statistical bias affects estimates fordifferent levels of force, Table 1 presents naıve esti-mates alongside ATEM51 bounds for excess force per1,000 black and Hispanic encounters across the fullspectrum of police actions—ranging from physicalhandling of a civilian to the use of pepper spray ora baton—again using the conservative racial-stop esti-mate fromGelman, Fagan, andKiss (2007) to apply ourbias correction. The results again show that the

traditional approach substantially understates the de-gree of racial bias in police use of force. Our results alsoinclude numerous cases in which downward bias pro-duces the illusion of no race effect. For example, whilethe approach in Fryer (2019) implies a statistically in-significant 2.4 instances per 1,000 encounters of pushingHispanic suspects to a wall due to suspect race, ourrevised estimate shows the true number is at least26—eleven times larger. We can also quantify thenumber of masked instances of racially discriminatoryuses of force as a percentage of all uses of force, dis-played in Figure 5. In the period we examine, black andHispanic civilians experienced force at the hands ofpolice 779,894 times. Using the approach in Fryer(2019), one would conclude that about 10% would nothave occurred had those civilians beenwhite.Using ourbias-corrected approach, we find that in fact 39% werediscriminatory. These underestimates persist across allforce threshold analyses.17

RECOMMENDATIONS FORFUTURE RESEARCH

The analysis above clarifies whether and when esti-mates of racial bias in police behavior identify causalquantities, shedding light on how traditional estimationapproaches that fail to account for posttreatment con-ditioning can inadvertently mask racially biased polic-ing. Our results suggest the body of evidence on this

FIGURE 5. Estimated Number of Racially Discriminatory Uses of Force against Black and HispanicCivilians, Divided by Total Observed Uses of Force among Those Groups Using Naıve (Red Dot) andBias-Corrected (Blue Triangle) Estimators of the ATTM51

Notes: In some cases, the naıve approach returns negative estimates, indicating that more uses of force would have occurred had thecivilians been white. The bias-corrected estimates show the naıve estimates substantially underestimate the pervasiveness ofanti–minorityracial bias in police violence.

17 These estimates were generated by computing the ATEM51 withcovariate adjustment.

Dean Knox, Will Lowe, and Jonathan Mummolo

632

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 15: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

topic that relies on police administrative data may belargely uninformative or even misleading. While ourbias-correction and bounding techniques are an im-provement, they still rely on assumptions that manyanalysts may not be willing to entertain. Some of theseassumptions, such as conditional treatment ignorability,are unavoidable. But others can be sidestepped orweakened through the use of research designs thatpreempt the problem of posttreatment conditioning. Inwhat follows, we detail a feasible research design thataddresses these concerns.

To estimate the effect of suspect race on poststoppolice behavior while avoiding the concerns outlinedabove, we describe a feasible study of police–civilianinteractions during traffic stops. A key advantage oftraffic studies is thatmuchof thedataneeded to improveresearch are already collected passively by law

enforcement agencies across the United States in anautomated fashion via highway cameras. We note thatbefore the advent of this technology, data on un-reportedpolice–civilian interactions had tobemanuallycollected by researchers accompanying patrol officerson their shifts (Allen 1982; Smith,Visher, andDavidson1984), a labor-intensive strategy highly vulnerable toresearcher demand effects (Orne 1962).

Recall that akeyproblem in the typical studyofpoliceadministrative data is the unobservability of thoseencounters that do not generate police reports. How-ever, given the prevalence of highway speed camerasacross police jurisdictions, it is entirely feasible to collectdata on every passing car (or a random sample ofpassing cars), whether or not police pulled the car overand recorded the stop. This mode of data collection hasalready been utilized in prior work (Kocieniewski 2002;

TABLE 1. Average Treatment Effect among Stops (ATEM51), by Severity of Force and Minority Group

Minimum force

ATEM51 for encounters with black civilians (vs. white)

No covariates Full specification

Bounds Naıve Bounds Naıve

Use of hands (112.66, 124.59) 61.69 (86.95, 96.70) 23.46(84.6, 151.84) (32.89, 90.63) (81.54, 102.14) (16.23, 30.54)

Push to wall (24.15, 27.75) 4.20 (26.47, 30.20) 6.66(15.50, 37.35) (25.29, 14.02) (24.24, 32.39) (3.67, 9.53)

Use of handcuffs (14.60, 16.92) 1.32 (16.59, 19.05) 3.95(9.45, 22.61) (24.83, 7.53) (15.10, 20.57) (1.95, 5.90)

Draw weapon (4.52, 5.14) 1.26 (4.71, 5.34) 1.46(3.13, 6.67) (20.33, 2.83) (4.20, 5.86) (0.77, 2.14)

Push to ground (4.04, 4.58) 1.22 (4.10, 4.64) 1.24(2.79, 5.97) (20.21, 2.66) (3.65, 5.07) (0.64, 1.80)

Point weapon (1.49, 1.70) 0.36 (1.63, 1.86) 0.55(0.96, 2.29) (20.29, 1.00) (1.36, 2.12) (0.18, 0.89)

Baton or pepper spray (0.17, 0.19) 0.08 (0.17, 0.19) 0.07(0.10, 0.26) (20.01, 0.15) (0.12, 0.24) (0.00, 0.14)

Minimum force

ATEM51 for encounters with Hispanic civilians (vs. white)

No covariates Full specification

Bounds Naıve Bounds Naıve

Use of hands (115.44, 127.53) 64.48 (78.93, 88.31) 15.44(88.94, 155.96) (37.06, 92.91) (74.70, 92.65) (9.60, 21.09)

Push to wall (26.41, 30.14) 6.46 (22.24, 25.75) 2.44(19.54, 37.79) (21.12, 14.26) (20.21, 27.80) (20.28, 5.08)

Use of handcuffs (12.54, 14.76) 20.74 (13.05, 15.31) 0.40(9.10, 18.24) (25.27, 3.57) (11.71, 16.64) (21.40, 2.1)

Draw weapon (3.42, 3.98) 0.16 (3.12, 3.66) 20.14(2.41, 5.08) (21.04, 1.33) (2.63, 4.16) (20.77, 0.49)

Push to ground (3.11, 3.60) 0.29 (2.71, 3.18) 20.14(2.18, 4.61) (20.83, 1.37) (2.27, 3.63) (20.71, 0.41)

Point weapon (0.73, 0.90) 20.41 (0.81, 0.98) 20.28(0.32, 1.29) (20.94, 0.08) (0.54, 1.26) (20.64, 0.07)

Baton or pepper spray (0.05, 0.06) 20.05 (0.05, 0.07) 20.05(20.01, 0.12) (20.13, 0.02) (0.00, 0.12) (20.12, 0.02)

Note:Excessuseof force usedagainstminority civilians (versuswhite civilians) per 1,000encounters.Bounds intervals indicate the rangeofpossibleATEM51 valueswhen the unknownproportion of discriminatory stops is approximatedwith the conservative estimate fromGelman,Fagan, and Kiss (2007). Estimates are bolded, and 95% confidence intervals are italicized.

Administrative Records Mask Racially Biased Policing

633

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 16: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Lange, Johnson, and Voas 2005), though in thosestudies, camera data on individual motorists were notlinked to administrative data on policing outcomes, aswe propose below.

Givena large randomsampleofpassing cars capturedby highway speed cameras, analysts could use video orphotographic records to document license plate num-bers that allow for a merge with other administrativedata sets containing information on the registrant’shome neighborhood, whether each car went on to bestopped by nearby police at a proximate time, whethera summons was issued, and whether the encounterescalated to include a search or the use of force. As withall causal analyses of observational data, analysts muststill make some version of Assumption 4(b)—notreatment-outcome confounding conditional on ob-servable covariates—but in this case, the standard“treatment selection on observables” plausibly holdsbecause virtually all prestop data available to an officerare in fact observable to the analyst. Using camerafootage merged with administrative records, analystscould credibly measure this “complete” set of controlvariables.18 These factors would include not only therace, age, gender, and registered neighborhood of the

driver but also themake, color, and condition of the car,along with weather and driving speed.

Given this set of covariates, researchers could cred-ibly estimate the ATE for various outcomes, includingsearching, ticketing, and the use of force, by comparingthe rates of outcomes between racial minority andmajority motorists, regardless of whether they werestopped by police, conditional on X. The ATTM51 issimilarly point identified because the proportion ofracial stops can be calculated and used to correct esti-mates. However, the ATEM51 remains partially iden-tified—the quantity can be bounded, as we show above,but not precisely estimated. And as Figure 6 makesclear, the CDEM51 remains fundamentally unidentifi-able without covariates that make Assumption 5plausible, such as controls for officer temperament thatare specific to some stops but not others (i.e., time-varying), which likely influences both stopping deci-sions and subsequent treatment of civilians.

CONCLUSION

With the release of large and granular data onpolice–civilian interactions, many researchers havefocused on estimating whether police exhibit racialbias in their treatment of civilians. Though somestudieshaveacknowledged the threat of posttreatmentbias in this setting (Fryer 2018), the issue has not beenadequately addressed, and studies in this area have leftambiguous which causal quantities are being approx-imated and the degree to which racial bias may beobscured by traditional estimation strategies. Giventhe policy relevance of this topic and the degree ofselection bias inherent to these analyses, we believesocial scientists need to devote substantial effort todevelop research designs that can sidestep the threat ofposttreatment conditioning rather than proceeding inthe face of this threat and simply hoping for the best.

In this study, we clarify the statistical problems in theuse of police administrative data in isolation to studyracial bias. We offer bias-correction and boundingprocedures for scholars analyzing these data, alongwithan improved research design that can avoid posttreat-ment conditioning altogether. Our results can informthe study of racial discrimination in a host of othersettings beyond law enforcement. And thoughwe focuson a case of racial bias in theUnited States, these resultsalso speak to a rich literature on racial discriminationoutside the U.S. context (e.g., Bruce-Jones 2015; Cano2010). Our identifying assumptions may also be usefulfor researchers seeking toaddressbiases stemming fromposttreatment conditioning more generally, beyondstudies of discrimination.

Whilewe are optimistic about alternative designs andestimation strategies, we are under no illusions thateliminating this particular source of bias will removeothers. Our research design suggestions may also limitthe outcomes that are feasible to study. For example,rare events such as shootings may or may not occurduring the observation periods proposed,meaning onlylower level uses of force or sanctioning can be studied in

FIGURE 6. Traffic Stop Design

Notes: The DAG illustrates potential back-door paths for stops(throughW, e.g.,heavilypolicedneighborhoods)and for theuseofforce (through V, e.g., car registrant has warrant for arrest) thatmay correlate with the presence of minority drivers. These areblocked (boxed) by conditioning on prestop variables, includinglicense plates as well as administrative records that can be linkedthrough them. Many mediator-outcome confounders (U) cannotbe blocked but do not pose a threat to inference for the ATE orATEM51.

18 This approach is akin to the design ofHainmueller andHangartner(2013), another rare instance in which the analyst could claim tomeasure all relevant covariates in an observational setting. In thatstudy, citizens made judgments about individuals applying for citi-zenship in Switzerland. Because all information on potential citizenswas contained on a flier distributed by the government, the authorscould credibly account for all possible factors that contributed to theaverage citizen’s judgment of applicants.

Dean Knox, Will Lowe, and Jonathan Mummolo

634

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 17: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

some cases. Our recommendations, therefore, placeemphasis onbias reduction over latitude in the selectionof research questions. But given the ease with whichfaulty conclusions can be reached as a result of the race-based selection we highlight, narrowing the scope ofresearch to generate more reliable estimates may bepreferable, especially because policy reforms couldhinge on the results of studies in this area. Put differ-ently, because of the pitfalls we highlight above, it is notclear that studies of rare phenomena that lack a sounddesign are generating usable knowledge anyway, so thistrade-off in scope may be of only marginal concern(Samii 2016).

Regardless of which approach scholars pursue, thisarticle highlights the need for further careful researchinto the first stage of police–civilian interactions—thatis, the process by which officers decide whether or notto stop and investigate an individual for a crime. Thiseffort is necessary not only to further our scholarlyunderstanding of police–civilian interactions but alsoto craft effective policy reforms. If racial bias is con-centrated in the initial stage of contact, reforms fo-cused on reducing unnecessary police–civilianinteractions may be most effective at curbing raciallydiscriminatory police violence. On the other hand, ifthere exists more significant bias in the ultimate de-cision to use force, substantial improvements mayrequire a wholly different reform strategy. Withoutserious consideration of the role of race in each stageofthe complex police–civilian interactions under study,the benefits of data-driven reforms will be stunted, aswill our collective understanding of the politics ofpolicing.

SUPPLEMENTARY MATERIAL

To view supplementary material for this article, pleasevisit https://doi.org/10.1017/S0003055420000039.

Replication materials can be found on Dataverse at:https://doi.org/10.7910/DVN/KFQOCV.

REFERENCES

Acharya, Avidit, Matthew Blackwell, and Maya Sen. 2016.“ExplainingCausalFindingswithoutBias:Detecting andAssessingDirect Effects.” Biometrics 110 (3): 512–29.

Alesina, Alberto, andEliana La Ferrara. 2014. “ATest of Racial Biasin Capital Sentencing.” The American Economic Review 104 (11):3397–433.

Alexander,Michelle. 2010.The New Jim Crow:Mass Incarceration inthe Age of Colorblindness. New York: The New Press.

Allen, David. 1982. “Police Supervision on the Street: AnAnalysis ofSupervisor/Officer Interaction During the Shift.” Journal ofCriminal Justice 10 (2): 91–109.

Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996.“Identification of Causal Effects Using Instrumental Variables.”Journal of the American Statistical Association 91 (434): 444–55.

Angrist, Joshua D., and Jorn-Steffen Pischke. 2008.Mostly HarmlessEconometrics: An Empiricist’s Companion. Princeton, NJ:Princeton University Press.

Antonovics, Kate, andBrianG.Knight. 2009. “ANewLook at RacialProfiling: Evidence from the Boston Police Department.” TheReview of Economics and Statistics 91 (1): 163–77.

Anwar, Shamena, and Hanming Fang. 2006. “AnAlternative Test ofRacial Prejudice in Motor Vehicle Searches: Theory and Evi-dence.” The Review of Economic Studies 96 (1): 127–51.

Arnold,David,WillDobbie,andCrystalS.Yang.2018.“RacialBias inBailDecisions.”Quarterly Journal of Economics 133 (4): 1885–932.

Arrow, Kenneth J. 1972. “Models of Job Discrimination.” In RacialDiscrimination in Economic Life, ed. Anthony Pascal. Lexington,MA: D.C. Heath, 83–102.

Arrow, Kenneth J. 1998. “What Has Economics to Say about RacialDiscrimination?” The Journal of Economic Perspectives 12 (2):91–100.

AtibaGoff,Phillip, andKimberlyBarsamianKahn.2012.“RacialBiasinPolicing:WhyWeKnowLessThanWeShould.”Social IssuesandPolicy Review 6 (1): 177–210.

Ayres, Ian. 2002. “Outcome Tests of Racial Disparities in PolicePractices.” Justice Research and Policy 4 (1–2): 131–42.

Balke, Alexander, and Judea Pearl. 1997. “Bounds on TreatmentEffects from Studies with Imperfect Compliance.” Journal of theAmerican Statistical Association 92 (439): 1171–6.

Baumgartner, Frank R., Derek A. Epp, Kelsey Shoub, and BayardLove. 2017. “Targeting YoungMen of Color for Search andArrestDuring Traffic Stops: Evidence from North Carolina, 2002–2013.”Politics, Groups, and Identities 5 (1): 107–31.

Becker, Gary. 1971. The Economics of Discrimination. Chicago:University of Chicago Press.

Blackwell, Matthew. 2013. “A Framework for Dynamic Causal In-ference in Political Science.” American Journal of Political Science57 (2): 504–20.

Brehm, John, andScottGates. 1999.Working, Shirking, andSabotage:Bureaucratic Response to a Democratic Public. Ann Arbor, MI:University of Michigan Press.

Bruce-Jones, Eddie. 2015. “German Policing at the Intersection:Race,Gender,Migrant Status andMentalHealth.”Race&Class 56(3): 36–49.

Burch, Traci. 2013. Trading Democracy for Justice: Criminal Con-victions and the Decline of Neighborhood Political Participation.University of Chicago Press.

Cano, Ignacio. 2010. “Racial Bias in Police Use of Lethal Force inBrazil.” Police Practice and Research: International Journal 11 (1):31–43.

Cohen,Elisha,AnnaGunderson,KaylynJackson,PaulZachary,TomS. Clark, Adam N. Glynn, and Michael Leo Owens. 2017. “DoOfficer-Involved Shootings Reduce Citizen Contact with Gov-ernment?” The Journal of Politics 81 (3): 1111–23.

Eberhardt, Jennifer, PhillipAtibaGoff, Valerie J. Purdie, andPaulG.Davies. 2004. “Seeing Black: Race, crime, and Visual Processing.”Journal of Personality and Social Psychology 87 (6): 876–93.

Edwards, Frank, Hedwig Lee, and Michael Esposito. 2019. “Risk ofBeing Killed by Police Use of Force in the United States by Age,Race–Ethnicity, and Sex.”Proceedings of the National Academy ofSciences 116 (34): 16793–8.

Elwert, Felix, andChristopherWinship. 2014.“EndogenousSelectionBias: TheProblemofConditioning onaColliderVariable.”AnnualReview of Sociology 40: 31–53.

Fisher, Marc, and Peter Hermann. 2015, June 8. “Did the Mckinney,Texas, Police Officer Know He Was Being Recorded?” TheWashington Post.

Frangakis, Constantine E., and Donald B. Rubin. 2002. “PrincipalStratification in Causal Inference.” Biometrics 58 (1): 21–9.

Fridell, Lorie A. 2017. “Explaining the Disparity in Results AcrossStudies Assessing Racial Disparity in Police Use of Force: A Re-searchNote.”The Journal of Economic Perspectives 42 (3): 502–13.

Fryer, RolandG. 2018. “ReconcilingResults onRacial Differences inPolice Shootings.” The American Economic Review 108: 228–33.

Fryer, RolandG. 2019. “AnEmpirical Analysis of Racial Differencesin Police Use of Force.” Journal of Political Economy 127 (3):1210–61.

Gelman,Andrew, JeffreyFagan, andAlexKiss. 2007.“AnAnalysis ofthe New York City Police Department’s ‘Stop-and-Frisk’ Policy inthe Context of Claims of Racial Bias.” Journal of the AmericanStatistical Association 102 (429): 813–23.

Glaser, Jack. 2014. Suspect Race: Causes and Consequences of RacialProfiling. New York: Oxford University Press.

Administrative Records Mask Racially Biased Policing

635

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 18: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Goel, Sharad, Justin M. Rao, and Ravi Shroff. 2016. “Precinct orPrejudice? Understanding Racial Disparities in New York City’sStop-and-Frisk Policy.”Annals of Applied Statistics 10 (1): 365–94.

Gottschalk, Marie. 2008. “Hiding in Plain Sight.” Annual Review ofPolitical Science 11 (1): 235–60.

Greiner, James D., and Donald B. Rubin. 2011. “Causal Effects ofPerceived Immutable Characteristics.” The Review of Economicsand Statistics 93 (4): 775–85.

Grogger, Jeffrey, and Greg Ridgeway. 2006. “Testing for RacialProfiling in Traffic Stops from Behind a Veil of Darkness.” Journalof the American Statistical Association 101 (475): 878–87.

Hainmueller, Jens, and Dominik Hangartner. 2013. “Who Getsa Swiss Passport? A Natural Experiment in Immigrant Discrimi-nation.” American Political Science Review 107 (1): 159–87.

Harvey, Anna, and Murat Mungan. 2019. “Policing for Profit: ThePolitical Economy of Law Enforcement.” Working Paper. https://s18798.pcdn.co/annaharvey/wp-content/uploads/sites/6417/2019/06/Policing_For_Profit_2019.pdf.

Heckman, James J. 1979. “Sample Selection Bias as a SpecificationError.” Econometrica 47 (1): 153–61.

Hernan,Miguel, SoniaHernandez-Diaz, and JamesRobins. 2004. “AStructural Approach to Selection Bias.” Epidemiology 15 (5):615–25.

Hernan, Miguel A. 2016. “Does Water Kill? A Call for Less CasualCausal Inferences.” Annals of Epidemiology 26 (10): 674–80.

Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal ofthe American Statistical Association 81 (396): 945–60.

Imai, Kosuke, Luke Keele, Dustin Tingley, and Teppei Yamamoto.2011. “Unpacking the Black Box of Causality: Learning aboutCausal Mechanisms from Experimental and Observational Stud-ies.” American Political Science Review 105 (4): 765–89.

Iyengar, Shanto. 1994. Is Anyone Responsible?: How TelevisionFrames Political Issues. Chicago: University of Chicago Press.

Johnson, David J., Trevor Tress, Nicole Burkel, Carley Taylor, andJoseph Cesario. 2019. “Officer Characteristics and Racial Dis-parities in Fatal Officer-Involved Shootings.” Proceedings of theNationalAcademyof Sciences116 (32): 15877–82. PublishedOnlineAhead of Print July 22, 2019. https://www.pnas.org/content/early/2019/07/16/1903856116.

Key, Valdimer Orlando. 1949. Southern Politics in State and Nation.New York: Knopf.

Knowles, J., N. Perisco, and P. Todd. 2001. “Racial Bias in MotorVehicle Searches: Theory and Evidence.” Journal of PoliticalEconomy 109 (1): 203–29.

Knox, Dean, and Jonathan Mummolo. 2020. “Making Inferencesabout Racial Disparities in Police Violence.” Proceedings of theNational Academy of Sciences. 117 (3): 1261–2. https://doi.org/10.1073/pnas.1919418117.

Knox, Dean, Teppei Yamamoto, Matthew A. Baum, and Adam J.Berinsky. 2019. “Design, Identification, andSensitivityAnalysis forPatient Preference Trials.” Journal of the American Statistical As-sociation 114 (528): 1532–46.

Kocieniewski, David. 2002, March 21. Study Suggests Racial Gap inSpeedinginNewJersey.TheNewYorkTimes.https://www.nytimes.com/2002/03/21/nyregion/study-suggests-racial-gap-in-speeding-in-new-jersey.html.

Lange, James E., Mark B. Johnson, and Robert B. Voas. 2005.“Testing the Racial Profiling Hypothesis for Seemingly DisparateTraffic Stops on theNew JerseyTurnpike.” JusticeQuarterly 22 (2):193–223.

Lasswell, Harold D. 1936. Politics: WhoGets What, When, How. NewYork: Whittlesey House.

Lee, David S. 2009. “Training, Wages, and Sample Selection: Esti-mating Sharp Bounds on Treatment Effects.” The Review ofEconomic Studies 76 (3): 1071–102.

Lerman, Amy, and Vesla Weaver. 2014. Arresting Citizenship: TheDemocratic Consequences of American Crime Control. Chicago:University of Chicago Press.

Lipsky, Michael. 1980. Street-Level Bureaucracy: Dilemmas of theIndividual in Public Service. New York: Russell Sage Foundation.

Magaloni, Beatriz,Edgar Franco, andVanessaMelo. 2015. “Killing inthe Slums: An Impact Evaluation of Police Reform in Rio deJaneiro.” Working Paper No. 556. https://siepr.stanford.edu/sites/default/files/publications/556wp_9.pdf.

Manski,CharlesF. 1995. IdentificationProblems in theSocial Sciences.Cambridge, MA: Harvard University Press.

Mummolo, Jonathan. 2018a. “Militarization Fails to Enhance PoliceSafety or Reduce Crime but May Harm Police Reputation.” Pro-ceedings of the National Academy of Sciences of the United States ofAmerica 115 (37): 9181–6.

Mummolo, Jonathan. 2018b. “Modern Police Tactics, Police-CitizenInteractions and the Prospects for Reform.”The Journal of Politics80 (1): 1–15.

Nix, Justin, Bradley A. Campbell, Edward H. Byers, and Geoffrey P.Alpert. 2017. “A Bird’s Eye View of Civilians Killed by Police in2015 Further Evidence of Implicit Bias.” Criminology & PublicPolicy 16 (1): 309–40.

Nyhan, Brendan, Christopher Skovron, and Rocıo Titiunik. 2017.“Differential Registration Bias in Voter File Data: A SensitivityAnalysis Approach.” American Journal of Political Science 61 (3):744–60.

NYPD. 2017. Use of Force Report, 2017. Technical Report. https://www1.nyc.gov/assets/nypd/downloads/pdf/use-of-force/use-of-force-2017.pdf.

Orne,Martin T. 1962. “On the Social Psychology of the PsychologicalExperiment:With Particular Reference toDemandCharacteristicsand Their Implications.” American Psychologist 17 (1): 776–83.

Ostrom, Elinor, and Gordon Whitaker. 1973. “Does Local Com-munity Control of Police Make a Difference? Some PreliminaryFindings.” American Journal of Political Science 17 (1): 48–76.

Pearl, Judea. 2001.“Direct and IndirectEffects.” InProceedings of theSeventeenth Conference on Uncertainty in Artificial Intelligence,411–20.

Pearl, Judea. 2018. “DoesObesity Shorten Life? or Is It the Soda? onNon-Manipulable Causes.” Journal of Causal Inference 6 (2): 1–7.

Peyton, Kyle, Michael Sierra-Arevalo, and David G. Rand. 2019. “AField Experiment on Community Policing and Police Legitimacy.”Proceedings of theNational Academy of Sciences 116 (40): 19894–8.

Phelps, Edmund S. 1972. “The Statistical Theory of Racism andSexism.” The American Economic Review 62 (1): 659–61.

Ridgeway, Greg. 2006. “Assessing the Effect of Race Bias in Post-Traffic Stop Outcomes Using Propensity Scores.” Journal ofQuantitative Criminology 22 (1): 1–29.

Ridgeway, Greg, and John MacDonald. 2010. Race, Ethnicity, andPolicing: New and Essential Readings, Chapter Methods forAssessing Racially Biased Policing. New York: New York Uni-versity Press.

Robins, James M., Miguel A. Hernan, and Babette Brumback. 2000.“Marginal Structural Models and Causal Inference in Epidemiol-ogy.” Epidemiology 11 (5): 550–60.

Rosenbaum, Paul R. 1984. “The Consequences of Adjustment foraConcomitantVariable thatHasBeenAffectedby theTreatment.”Journal of the Royal Statistical Society 147 (5): 656–66.

Rubin, Donald B. 1974. “Estimating Causal Effects of Treatments inRandomized and Non-Randomized Studies.” Journal of Educa-tional Psychology 66 (5): 688–701.

Rubin, Donald B. 1990. “Formal Mode of Statistical Inference forCausal Effects.” Journal of Statistical Planning and Inference 25 (3):279–92.

Rubin, Donald B. 2000. “Causal Inference without Counterfactuals:Comment.” Journal of theAmericanStatisticalAssociation 95 (450):435–8.

Samii, Cyrus. 2016. “Causal Empiricism in Quantitative Research.”The Journal of Politics 78 (3): 941–55.

Sen, Maya, and Omar Wasow. 2016. “Race as a Bundle of Sticks:Designs that Estimate Effects of Seemingly Immutable Charac-teristics.” Annual Review of Political Science 19: 499–522.

Simoiu, Camelia, Sam Corbett-Davies, and Sharad Goel. 2017. “TheProblem of Infra-Marginality in Outcome Tests for Discrimina-tion.”AnnalsofAppliedStatistics11 (3): 1193–216.https://arxiv.org/abs/1706.05678.

Smith, Douglas A., Christy A. Visher, and Laura A. Davidson. 1984.“Equity andDiscretionary Justice: The Influence of Race on PoliceArrest Decisions.” Journal of Criminal Law & Criminology 75 (1):234.

Soss, Joe, and Vesla Weaver. 2017. “Police Are Our Government:Politics, Political Science, and the Policing of Race–Class Sub-jugated Communities.” Annual Review of Political Science 20:565–91.

Dean Knox, Will Lowe, and Jonathan Mummolo

636

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 19: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

VanderWeele, Tyler J. 2009. “Marginal Structural Models for theEstimation of Direct and Indirect Effects.” Epidemiology 20 (1):18–26.

VanderWeele, Tyler J. 2011. “Principal Stratification–Uses andLimitations.” International Journal of Biostatistics 7 (1): 1–14.

West, Jeremy. 2018.“RacialBias inPolice Investigations.”WorkingPaper.https://people.ucsc.edu/~jwest1/articles/West_RacialBiasPolice.pdf.

White, Ariel. 2019. “Misdemeanor Disenfranchisement? TheDemobilizing Effects of Brief Jail Spells on Potential Voters.”American Political Science Review 113 (2): 311–24.

Wilson, JamesQ. 1968.Varieties of Police Behavior. Cambridge,MA:Harvard University Press.

Wilson, James Q. 1989.Bureaucracy: What Government Agencies Doand Why They Do It. New York: Basic Books.

Yamamoto, Teppei. 2012. “Understanding the Past: StatisticalAnalysis of Causal Attribution.” American Journal of PoliticalScience 56 (1): 237–56.

Zhang, Junni L., and Donald B. Rubin. 2003. “Estimation of CausalEffects via Principal Stratification When Some Outcomes AreTruncated by ‘Death’.” Journal of Educational and BehavioralStatistics 28 (4): 353–68.

Administrative Records Mask Racially Biased Policing

637

Dow

nloa

ded

from

htt

ps://

ww

w.c

ambr

idge

.org

/cor

e. IP

add

ress

: 68.

82.1

89.8

2, o

n 29

Jul 2

020

at 1

4:44

:29,

sub

ject

to th

e Ca

mbr

idge

Cor

e te

rms

of u

se, a

vaila

ble

at h

ttps

://w

ww

.cam

brid

ge.o

rg/c

ore/

term

s. h

ttps

://do

i.org

/10.

1017

/S00

0305

5420

0000

39

Page 20: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Administrative Records Mask Racially Biased Policing

Online Appendix

Contents

A Detailed proofs 1A.1 Bias for ATEM=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1A.2 Bias for ATTM=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4A.3 Bias for CDEM=1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5A.4 Nonparametric sharp bounds for ATEM=1 . . . . . . . . . . . . . . . . . . . . . . . 7A.5 Uncertainty of bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10A.6 Point identi�cation of ATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11A.7 Derivation of outcome test bounds on � . . . . . . . . . . . . . . . . . . . . . . . . 14

B Additional results 17B.1 Coding schemes for dependent variables . . . . . . . . . . . . . . . . . . . . . . . 17B.2 Varying levels of force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20B.3 Excluding drug stops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23B.4 Analysis of two races at a time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Page 21: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

A Detailed proofs

A.1 Bias for ATEM=1

We �rst derive the bias of the local di�erence in means (that is, among encounters with Xi = x ,Δx = Yi |Di = 1, Mi = 1, Xi = x −Yi |Di = 0, Mi = 1, Xi = x), in estimating the local average treatmente�ect among stops, ATEM=1,x = E[Yi(1, Mi(1))|Mi = 1, Xi = x] − E[Yi(0, Mi(0))|Mi = 1, Xi = x]. Theoverall bias is then given by ∑x (E[Δx] − ATEM=1,x) Pr(Xi = x |Mi = 1).

E[Δx ] − ATEM=1,x

= (E[Yi |Di = 1, Mi = 1, Xi = x − Yi |Di = 0, Mi = 1, Xi = x])

− (E[Yi(1, Mi(1))|Mi = 1, Xi = x] − E[Yi(0, Mi(0))|Mi = 1, Xi = x])

= E[Yi |Di = 1, Mi(Di) = 1, Xi = x] − E[Yi |Di = 0, Mi(Di) = 1, Xi = x]

− E[Yi(1, Mi(1))|Mi(Di) = 1, Xi = x] + E[Yi(0, Mi(0))|Mi(Di) = 1, Xi = x]

= E[Yi |Di = 1, Mi(Di) = 1, Xi = x] − E[Yi(1, Mi(1))|Mi(Di) = 1, Xi = x]

− E[Yi |Di = 0, Mi(Di) = 1, Xi = x] + E[Yi(0, Mi(0))|Mi(Di) = 1, Xi = x]

= E[Yi(1, Mi(1))|Di = 1, Mi(Di) = 1, Xi = x]

− E[Yi(1, Mi(1))|Di = 1, Mi(Di) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1))|Di = 0, Mi(Di) = 1, Xi = x]Pr(Di = 0|Mi(Di) = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(Di) = 1, Xi = x]

+ E[Yi(0, Mi(0))|Di = 1, Mi(Di) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

+ E[Yi(0, Mi(0))|Di = 0, Mi(Di) = 1, Xi = x]Pr(Di = 0|Mi(Di) = 1, Xi = x)

= E[Yi(1, Mi(1))|Di = 1, Mi(Di) = 1, Xi = x]Pr(Di = 0|Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1))|Di = 0, Mi(Di) = 1, Xi = x]Pr(Di = 0|Mi(Di) = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(Di) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

+ E[Yi(0, Mi(0))|Di = 1, Mi(Di) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

under mediator monotonicity, Pr(Mi(1) = 0|Di = 0, Mi(Di) = 1, Xi = x) = 0 and Pr(Mi(1) = 1|Di =0, Mi(Di) = 1, Xi = x) = 1,

= E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi(Di) = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

1

Page 22: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(1) = 1|Di = 0, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi(Di) = 1, Xi = x)− E[Yi(1, Mi(1))|Di = 0, Mi(1) = 0, Mi(0) = 1, Xi = x]

Pr(Mi(1) = 0|Di = 0, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi(Di) = 1, Xi = x)− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(1) = 1|Di = 0, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 0, Mi(0) = 1, Xi = x]

Pr(Mi(1) = 0|Di = 0, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)+ E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)+ E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)= E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)− E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)− E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)− E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Di = 1|Mi(Di) = 1, Xi = x)− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Di = 1|Mi(Di) = 1, Xi = x)+ E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)+ E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

2

Page 23: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

adding and subtracting E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di =1, Mi(Di) = 1, Xi = x),

= E[Yi(1, Mi(1)) − Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1)) − Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1)) − Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

− E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

substituting potential mediators based on principal strata,

= E[Yi(1, 1) − Yi(0, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1) − Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1) − Yi(0, 0)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

− E[Yi(1, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

− E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

under mandatory reporting, Yi(d, 0) = 0,

= E[Yi(1, 1) − Yi(0, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1) − Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

3

Page 24: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

− E[Yi(1, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

− E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

invoking assumption 4(b) (treatment ignorability),

= (E[Yi(1, 1) − Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]

− E[Yi(1, 1) − Y (0, 0)|Mi(1) = 1, Mi(0) = 0, Xi = x]

) Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− (E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]

− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x])Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

which is Equation 6.

A.2 Bias for ATTM=1

Next, we consider the bias that results when the local di�erence in means is used as an estima-tor for the local average racial e�ect among stopped minorities, ATTM=1,x = E[Yi(1, Mi(1))|Di =1, Mi = 1, Xi = x] − E[Yi(0, Mi(0))|Di = 1, Mi = 1, Xi = x]. Again, overall bias is found by theweighted average of local biases, ∑x (E[Δx] − ATTM=1,x) Pr(Xi = x |Di = 1, Mi = 1).

E[Δx ] − ATTM=1,x = (E[Yi |Di = 1, Mi = 1, Xi = x − Yi |Di = 0, Mi = 1, Xi = x])− (E[Yi(1, Mi(1))|Di = 1, Mi = 1, Xi = x] − E[Yi(0, Mi(0))|Di = 1, Mi = 1, Xi = x])

= E[Yi(1, Mi(1))|Di = 1, Mi(Di) = 1, Xi = x]− E[Yi(0, Mi(0))|Di = 0, Mi(Di) = 1, Xi = x]− E[Yi(1, Mi(1))|Di = 1, Mi(Di) = 1, Xi = x]+ E[Yi(0, Mi(0))|Di = 1, Mi(Di) = 1, Xi = x]

= − E[Yi(0, Mi(0))|Di = 0, Mi(Di) = 1, Xi = x]+ E[Yi(0, Mi(0))|Di = 1, Mi(Di) = 1, Xi = x]

= − E[Yi(0, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1|Di = 0, Mi(Di) = 1, Xi = x)− E[Yi(0, 1)|Di = 0, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0|Di = 0, Mi(Di) = 1, Xi = x)+ E[Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)+ E[Yi(0, 0)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

4

Page 25: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

by mediator monotonicity

= − E[Yi(0, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]+ E[Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)+ E[Yi(0, 0)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

by mandatory reporting

= − E[Yi(0, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]+ E[Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)

by treatment ignorability

= − E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Mi(1) = 1, Xi = x)

A.3 Bias for CDEM=1

The CDEM=1 is de�ned as

CDEM=1 = E[Yi(1, 1)|Mi(Di) = 1] − E[Yi(0, 1)|Mi(Di) = 1] (1)

It is a somewhat contrived estimand because it necessarily involves counterfactuals that, forracial-stop encounters, could never realize even if researchers could somehow randomize civil-ian race in police-civilian encounters. For example, when a minority civilian is racially stoppedfor a “furtive movement” and reaches for their wallet, it makes little sense to consider an o�cer’spotential use of force if the civilian suddenly became white at that moment: had police observeda white civilian from the onset, a stop would never have occurred. Moreover, the assumptionsrequired for such counterfactuals are fundamentally unveri�able (Robins and Greenland, 1992),unless the experimentalist can somehow also manipulate o�cer stopping decisions without dis-torting outcomes. (We note that always-stop encounters are not subject to this issue, but analystscannot hope to estimate the conditional CDE in this group because it is impossible to identifywhich minority encounters belong to this group.)

In the previous bias expression for the ATEM=1, white individuals in the data—necessarilybelonging to the always-stop group, Mi(1) = Mi(0) = 1—were used to estimate the Yi(0, Mi(0))potential outcomes of minority encounters in the data. Unavoidable bias arose as long as any mi-nority individuals in the data belonged to the racial-stop group—had these individuals been white,

5

Page 26: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

they would never have been stopped, and hence would not be subject to force. Changing the tar-get estimand to the CDEM=1 conceptually sidesteps this speci�c issue by considering a di�erentcounterfactual, Yi(0, 1) instead of Yi(0, Mi(0)). But as we note above, for encounters in which onlyminority civilians would be stopped—that is, encounters with Mi(1) = 1 and Mi(0) = 0—this newcounterfactual represents an impossible cross-world scenario. The CDEM=1 asks whether forcewould have been used if o�cers were forced, against nature, to stop a white individual in thisencounter as if they were a minority.

We now demonstrate that the local di�erence in means remains biased for the local controlleddirect e�ect, CDEM=1,x = E[Yi(1, 1)|Mi = 1, Xi = x]−E[Yi(0, 1)|Mi = 1, Xi = x], unless o�cers are asviolent toward minorities in always-stop encounters (where they are forced to intervene) as theyare in racially discriminatory stops (where they are free to exercise discretion). In other words,for the naïve estimator to recover the CDEM=1, Assumption 5 must hold. The derivation is almostidentical to that of the ATEM=1,x , di�ering only in that all individuals are held at Mi = 1 instead ofallowed stops to vary with civilian race, Mi(Di). Bias for CDEM=1 is then given by the weightedaverage of local biases, ∑x (E[Δx] − CDEM=1,x) Pr(Xi = x |Mi = 1).

E[Δx ] − CDEM=1,x = (E[Yi |Di = 1, Mi = 1, Xi = x − Yi |Di = 0, Mi = 1, Xi = x])− (E[Yi(1, 1)|Mi = 1, Xi = x] − E[Yi(0, 1)|Mi = 1, Xi = x])

= E[Yi |Di = 1, Mi(Di) = 1, Xi = x] − E[Yi |Di = 0, Mi(Di) = 1, Xi = x]− E[Yi(1, 1)|Mi(Di) = 1, Xi = x] + E[Yi(0, 1)|Mi(Di) = 1, Xi = x]

= E[Yi(1, 1) − Yi(0, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Di = 1|Mi(Di) = 1, Xi = x)− E[Yi(1, 1) − Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 1|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)− E[Yi(1, 1) − Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]− E[Yi(1, 1)|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)− E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

under assumption 4(b),

= (E[Yi(1, 1) − Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]− E[Yi(1, 1) − Yi(0, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]

6

Page 27: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

) Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)− (E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x] − E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x])

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

which reproduces Equation 7.Finally, we demonstrate that this bias is weakly negative. Rearranging terms yields

E[Yi(1, 1) − Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1) − Yi(0, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

+ E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)

= −E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

+ E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)(1 − Pr(Di = 1|Mi(Di) = 1, Xi = x))

+ E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]× Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)(1 − Pr(Di = 1|Mi(Di) = 1, Xi = x))

The sum of the �rst and second terms is weakly negative by Assumption 3, because the magnitudeof the �rst term is greater than that of the second, and similarly the sum of the third and fourthterms is also weakly negative. Therefore, bias is weakly negative when the naïve estimator isused to estimate the CDEM=1.

A.4 Nonparametric sharp bounds for ATEM=1In this section, we derive nonparametric sharp bounds for the ATEM=1,x . We begin with thecase when the proportion of racially discriminatory stops among reported minority encounters,Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x), is known or can be assumed. Rearrangement of Equa-

7

Page 28: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

tion 6 (within levels of X ) yields

ATEM=1,x = E[Δx ]+ E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)(1 − Pr(Di = 1|Mi(Di) = 1, Xi = x))− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)(1 − Pr(Di = 1|Mi(Di) = 1, Xi = x))+ E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)= E[Δx ]

+ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(1) = 1, Xi = x)

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi(Di) = 1, Xi = x)

− E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(1) = 1, Xi = x)

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)2 (2)

Pr(Di = 0|Mi(Di) = 1, Xi = x)− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi(Di) = 1, Xi = x)+ E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi(Di) = 1, Xi = x)

= E[Δx ]

+ E[Yi |Di = 1, Mi = 1, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(1) = 1, Xi = x)

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi = 1, Xi = x)

− E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 1|Di = 1, Mi(1) = 1, Xi = x)

Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)2 (3)

Pr(Di = 0|Mi = 1, Xi = x)− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 0|Mi = 1, Xi = x)+ E[Yi |Di = 0, Mi = 1, Xi = x]Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi = x)Pr(Di = 1|Mi = 1, Xi = x)

(4)

We then construct bounds on E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x] based on Fréchetinequalities for the joint distribution, Pr(Yi(1, 1) = 1, Mi(0) = 0|Di = 1, Mi(1) = 1, Xi = x), whichincorporate marginal information about Yi(1, 1) and Mi(0).

max {0, Pr(Mi(0) = 0|Di = 1, Mi(1) = 1, Xi = x) + E[Yi |Di = 1, Mi = 1, Xi = x] − 1}Pr(Mi(0) = 0|Di = 1, Mi(1) = 1, Xi = x)

≤ E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x] ≤ (5)min {Pr(Mi(0) = 0|Di = 1, Mi(1) = 1, Xi = x),E[Yi |Di = 1, Mi = 1, Xi = x]}

Pr(Mi(0) = 0|Di = 1, Mi(1) = 1, Xi = x)

These bounds are sharp given only marginal information, Pr(Yi(1, 1) = 1|Di = 1, Mi(1) = 1, Xi = x)

8

Page 29: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

and Pr(Mi(0) = 0|Di = 1, Mi(1) = 1, Xi = x). However, the upper bound can be tightened furtherunder Assumption 3, which implies E[Yi(1, 1)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x] ≤ E[Yi |Di =1, Mi = 1, Xi = x]; this is at least as small as the upper Fréchet bound.

Finally, note that the reported data contain no information that can be used to constrainthe proportion of racially discriminatory minority stops, Pr(Mi(0) = 0|Di = 1, Mi(Di) = 1, Xi =x). If this proportion were zero, then the distribution of civilian race in police reports wouldre�ect that of all police encounters (within levels of X ). The reported data cannot distinguishbetween this possibility and an alternative population in which �x = Pr(Mi(0) = 0|Di = 1, Mi(Di) =1, Xi = x) is large, but white encounters are also larger by the proportion 1/(1 − �x ). Withoutside information about the total number of encounters, this proportion can take on any valuein [0, 1). Therefore, sharp bounds on ATEM=1 alone are obtained by substituting Equation 5 intoEquation 4 and setting the proportion of racial stops to unity. The bivariate bounds de�ne theregion in which (ATEM=1, �x ) pairs are consistent with the observed data. When �x is set to zero orone, these respectively recover the di�erence in reported means and the marginal upper boundson ATEM=1. For �x ∈ (0, 1),

E[Δx ] + �x E[Yi |Di = 0, Mi = 1, Xi = x](1 − Pr(Di = 0|Mi = 1, Xi = x))≤ ATEM=1,x ≤

E[Δx ]

+ �x1 − �x (E[Yi |Di = 1, Mi = 1, Xi = x] − max

{0, 1 + 1

�xE[Yi |Di = 1, Mi = 1, Xi = x] −

1�x

}

) Pr(Di = 0|Mi = 1, Xi = x)

+�x E[Yi |Di = 0, Mi = 1, Xi = x] (1 − Pr(Di = 0|Mi = 1, Xi = x)),

which reduces to Proposition 1 in the no-covariate case. Otherwise, bounds on ATEM=1 are givenby ∑x ATEM=1,x Pr(Xi = x |Mi = 1) ≤ ATEM=1 ≤ ∑x ATEM=1,x Pr(Xi = x |Mi = 1), where ATEM=1,x(ATEM=1,x ) denote the lower (upper) bounds on the local average treatment e�ect.

Finally, we note that per Equation 3, the ATTM=1,x can be written

ATTM=1,x = E[Yi(1, Mi(1)) − Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x] Pr(Mi(0) = 1|Di = 1, Mi = 1, Xi = x)+ E[Yi(1, Mi(1)) − Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x] Pr(Mi(0) = 0|Di = 1, Mi = 1, Xi = x)

= E[Yi(1, 1)|Di = 1, Mi = 1, Xi = x]− E[Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x] Pr(Mi(0) = 1|Di = 1, Mi = 1, Xi = x)− E[Yi(0, 0)|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x] Pr(Mi(0) = 0|Di = 1, Mi = 1, Xi = x)

under Assumption 1,

= E[Yi(1, 1) − |Di = 1, Mi = 1, Xi = x]− E[Yi(0, 1)|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x] Pr(Mi(0) = 1|Di = 1, Mi = 1, Xi = x)

9

Page 30: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

and under Assumption 4,

= E[Yi |Di = 1, Mi = 1, Xi = x] − E[Yi |Di = 0, Mi = 1, Xi = x] (1 − Pr(Mi(0) = 0|Di = 1, Mi = 1, Xi = x))

which can be estimated from observed data if the proportion of racial stops is known. It thenfollows that

ATTM=1 = ∑x(E[Yi |Di = 1, Mi = 1, Xi = x] − E[Yi |Di = 0, Mi = 1, Xi = x] + �xE[Yi |Di = 0, Mi = 1, Xi = x])

= ∑x(E[Δx ] + �xE[Yi |Di = 0, Mi = 1, Xi = x]) .

A.5 Uncertainty of bounds

Here, we describe our approach for constructing con�dence intervals for the bounds on thesecausal quantities. We take Xi , Di and Mi as �xed, so that uncertainty in the bounds arises strictlyfrom the estimation of the conditional expectations, E[Yi |Di = d,Mi = 1, Xi = x]. The asymptoticdistribution of the estimated lower and upper bounds endpoints, ( ATEM=1, ATEM=1), then followsdirectly from the asymptotic joint distribution of E[Yi |Di = d,Mi = 1, Xi = x] for all d and x . Weapproximate this through a Monte Carlo simulation in which parameters of the logistic regressionmodels described in Section 5 are sampled from a multivariate normal distribution centered onthe parameter estimates and with the estimated covariance matrix. For each parameter sample � ∗,the corresponding bounds endpoint pair (ATE∗M=1,ATE∗M=1) is computed deterministically; afterdrawing a su�cient number of such samples, we numerically obtain the shortest range that fullycontains 95% of all simulated bounds intervals. Closely related alternatives to this approach arethe bootstrap-based method of Horowitz and Manski (2000) and the fully Bayesian approachtaken in Knox et al. (2019). For the analysis in Section 5, we follow Fryer (2019) in using a cluster-robust covariance estimator, clustering on precinct, and 5,000 samples were drawn for each forcethreshold and model speci�cation.

10

Page 31: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

A.6 Point identi�cation of ATE

First, we note that strata sizes are identi�ed with information on the total count of encounters byrace (both reported and unreported).

Pr(Mi(1) = 1, Mi(0) = 1, Xi = x) = Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x)= Pr(Mi = 1|Di = 0, Xi = x)

Pr(Mi(1) = 1, Mi(0) = 0, Xi = x) = Pr(Mi(1) = 1, Xi = x)− Pr(Mi(1) = 1, Mi(0) = 1, Xi = x)Pr(Mi = 1|Di = 1, Xi = x)− Pr(Mi = 1|Di = 0, Xi = x)

Pr(Mi(1) = 0, Mi(0) = 1, Xi = x) = 0Pr(Mi(1) = 0, Mi(0) = 0, Xi = x) = 1 − Pr(Mi(1) = 0, Mi(0) = 1, Xi = x)

− Pr(Mi(1) = 1, Mi(0) = 0, Xi = x)− Pr(Mi(1) = 1, Mi(0) = 1, Xi = x)

= 1 − Pr(Mi = 1|Di = 1, Xi = x)

We then reexpress the ATE in terms of strata-speci�c mean potential outcomes and simplify.

11

Page 32: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

ATE = E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 0, Mi(0) = 0, Xi = x]Pr(Mi(1) = 0, Mi(0) = 0|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 0, Mi(0) = 0, Xi = x]Pr(Mi(1) = 0, Mi(0) = 0|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 0, Mi(0) = 0, Xi = x]Pr(Mi(1) = 0, Mi(0) = 0|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 0, Mi(0) = 0, Xi = x]

12

Page 33: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Pr(Mi(1) = 0, Mi(0) = 0|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

under mandatory reporting

= E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 0, Mi(0) = 1, Xi = x]Pr(Mi(1) = 0, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

under mediator monotonicity

= E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

+ E[Yi(1, Mi(1))|Di = 0, Mi(1) = 1, Mi(0) = 0, Xi = x]Pr(Mi(1) = 1, Mi(0) = 0|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

− E[Yi(0, Mi(0))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Xi = x) Pr(Di = 1, Xi = x)

− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x]

13

Page 34: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Pr(Mi(1) = 1, Mi(0) = 1|Di = 0, Xi = x) Pr(Di = 0, Xi = x)

under treatment ignorability

= E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 1, Xi = x] Pr(Mi(1) = 1, Mi(0) = 1, Xi = x)+ E[Yi(1, Mi(1))|Di = 1, Mi(1) = 1, Mi(0) = 0, Xi = x] Pr(Mi(1) = 1, Mi(0) = 0, Xi = x)− E[Yi(0, Mi(0))|Di = 0, Mi(1) = 1, Mi(0) = 1, Xi = x] Pr(Mi(1) = 1, Mi(0) = 1, Xi = x)

which can be recovered from observed data

= E[Yi |Di = 1, Mi(Di) = 1, Xi = x] Pr(Mi = 1|Di = 1, Xi = x)− E[Yi |Di = 0, Mi(Di , Xi = x) = 1, Xi = x] Pr(Mi = 1|Di = 0, Xi = x)

which reduces to Proposition 2 in the no-covariate case.

A.7 Derivation of outcome test bounds on �Our paper focuses on the di�culty of estimating a race e�ect on post-stop police behavior suchas the use of force. However, another popular approach, the outcome test, focuses on establishingwhether there exists any bias in the decision to stop a civilian (Becker, 1971; Goel, Rao and Shro�,2016; Engel, 2008; Knowles, Perisco and Todd, 2001; Ridgeway and MacDonald, 2010). Becausethe degree of the statistical bias we explore is a function of racial discrimination in stoppingdecisions, it is useful to clarify the assumptions undergirding outcome tests. In the process, wedemonstrate that the principal strati�cation framework sheds light on the precise interpretationof outcome tests, and we prove that the outcome test can be used to establish a lower bound onthe share of police stops of racial minorities that are racially discriminatory.

Outcome tests compare the rates of �nding evidence of a crime—conditional on a suspectbeing stopped by police—across racial groups. The logic behind the test is that if the decisionto stop a civilian is unbiased, the rate of discovering evidence of a crime (“hit rates”) shouldbe identical across groups. Proponents of outcome tests thus claim that di�erences in hit ratesamount to evidence of racially biased policing. The empirical observation that hit rates are loweramong minority stops can be written as E[Yi |Di = 0, Mi = 1] > E[Yi |Di = 1, Mi = 1], where Yi is anindicator, say, for �nding contraband on a suspect. However, interpreting the above inequalityas evidence of racial discrimination in fact requires assumptions that closely mirror those wedescribe above.

To see this, �rst observe that the overall hit rate among minority stops can be decomposed into

14

Page 35: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

the weighted average of the hit rate among always-stop encounters and the hit rate among the(possibly nonexistent) set of racially discriminatory stops. In contrast, if we invoke Assumption 2(which states that there are no white civilians stopped in circumstances where a minority civilianwould be allowed to pass), then stops involving white civilians belong exclusively to the always-stop group.1 In this case, the empirical di�erence in hit rates can be rewritten in the potentialoutcomes framework as

E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Di = 0]−E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Di = 1]Pr(Mi(1) = 1, Mi(0) = 1|Di = 1, Mi = 1)− E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Di = 1]Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Mi = 1) > 0 (6)

A major critique of the outcome test is that observed racial disparities in hit rates alone do notconstitute evidence of racially discriminatory stops because of the problem of “infra-marginality”(Ayres, 2002; Simoiu, Corbett-Davies and Goel, 2017). This critique suggests that the above in-equality may hold simply because white civilians in always-stop encounters engage in more crim-inal conduct than minority suspects. In other words, the analyst might observe E[Yi |Di = 1, Mi =1] < E[Yi |Di = 0, Mi = 1] even if Pr(Mi(1) = 1, Mi(0) = 0) = 0—that is, with no discrimination instops—as long as E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Di = 1] < E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Di = 0].Some analysts employing the outcome test cast this scenario as unlikely, arguing that absentracial bias in stopping, “it would be di�cult to explain why. . .whites for some reason had a sys-tematically higher chance of possessing evidence of illegality” (Ayres, 2002) (137) and “there arenot compelling reasons to suspect” this to be the case (138). Indeed, the validity of the outcometest hinges on the assumption that white and minority civilians in always-stop encounters com-mit crimes at the same rates, or that E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 1, Di = 1] = E[Yi(0, 1)|Mi(1) =1, Mi(0) = 1, Di = 0]. This assumption closely parallels Assumption 4, which requires that treat-ment status is ignorable with respect to potential outcomes. (For simplicity, we suppose that thisholds without conditioning on covariates, but the result also holds within levels of Xi = x .) Inthis case, the observed racial di�erence in hit rates can be rewritten as

E[Yi |Di = 0, Mi = 1] − E[Yi |Di = 1, Mi = 1]= (E[Yi(0, 1)|Mi(1) = 1, Mi(0) = 1, Di = 0]

−E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Di = 1]) Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Mi = 1). (7)

This formulation makes clear that the observed evidence gap is due to the di�erence in hit1If we do not assume mediator monotonicity, and allow for the presence of stops of white suspects that would

not have occurred if the suspect was a racial minority, then the inequality used to estimate the outcome test becomesuninformative with respect to racial discrimination.

15

Page 36: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

rates between always-stop minority encounters—in which o�cers would also have stopped awhite civilian—and racially discriminatory minority stops. If the former is assumed to producemore evidence of criminal behavior (Assumption 3; this might hold if racially discriminatorystops are made under weaker standards of evidence), then it can be seen from Equation 7 that theempirical di�erence in hit rates implies that Pr(Mi(1) = 1, Mi(0) = 0) > 0: that there must existencounters in which minority civilians would be stopped but white civilians would not, preciselyas proponents of the outcome test suggest.

Equation 7 also shows that outcome tests are unable to identify the exact prevalence of racialstops. Outcome tests allow the analyst to infer whether there is any racial bias in the decisionto stop a suspect—but only if the analyst makes assumptions similar to those we outline above.However, we show that the outcome test can partially identify a range of possible proportionsof racial stops. This clari�cation is useful, as it allows us later in this analysis to appeal to apublished study of hit rates (Goel, Rao and Shro�, 2016) to help characterize the statistical biasin analyses of post-stop police behavior (e.g. Fryer, 2019).

By rearranging Equation 7 and substituting observed quantities, we arrive at

Pr(Mi(1) = 1, Mi(0) = 0|Di = 1, Mi = 1) =E[Yi |Di = 0, Mi = 1] − E[Yi |Di = 1, Mi = 1]

E[Yi |Di = 0, Mi = 1] − E[Yi(1, 1)|Mi(1) = 1, Mi(0) = 0, Di = 1]

Although the second term in the denominator is unknown, the implied proportion of raciallydiscriminatory stops is smallest when this value is zero—if, hypothetically, searches of raciallystopped minorities never produce evidence. Thus, the outcome test suggests that at least (E[Yi |Di =0, Mi = 1]−E[Yi |Di = 1, Mi = 1]) /E[Yi |Di = 0, Mi = 1] of all minority stops are racially discrimina-tory, and to the extent that racially discriminatory searches result in any evidence of contraband,the proportion could potentially be much larger.

16

Page 37: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

B Additional results

B.1 Coding schemes for dependent variables

In this section, we reanalyze the NYPD SQF data using both the original and revised codingschemes for dependent variables in Fryer (2019). In an analysis of the use of force by level ofseverity, Fryer (2019) codes binary outcomes indicating whether force is used at or above somethreshold. However, rather than coding all encounters with lower levels of force than a giventhreshold as a zero, the analysis coded only encounters with no force at all as a zero, while lev-els of force between no force and the threshold level were dropped from the data.2 This datadropping strategy, a form of selection on the dependent variable, is problematic. If the analystsuspects that civilian race a�ects which level of force is applied—the motivating hypothesis forthis very analysis—then dropping data based on which level of force was applied is another formof post-treatment conditioning and will induce bias. Further, the amount of data lost under thiscoding scheme is substantial. In the case of the point-weapon threshold, for example, over onemillion encounters—over 20% of the data—appear to have been discarded despite containing sub-threshold force use, such as pushing a civilian to the ground. Table B1 displays the number ofobservations reported for various regressions in the original paper, our best attempts at replica-tion, and the corrected procedure used in this paper.

We present results of our replication study using the original coding scheme and our correctedversion side by side below. As the results show, a corrected analysis generally depresses the naïvetreatment e�ects relative to the inadvisable coding scheme in Fryer (2019) and in most casesrenders the original results statistically insigni�cant. However, these discrepancies in resultsacross coding decisions do not alter the central point of our paper: post-treatment conditioningexerts a large downward bias on estimates of racially discriminatory uses of force.

2Fryer (2019) acknowledges this data dropping strategy, writing “To be clear, an observation that records onlyhands would be in the hands regression but not the regression which restricts the sample to observations in whichindividuals were at least forced to the ground,” (21, emphasis in original).

17

Page 38: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Table B1: Comparison of SQF Data Dimensions Based on Outcome Coding. The tabledisplays the number of observations from bivariate analyses of the use of force by the NYPD usingthree coding procedures for force outcomes. The �rst column displays the number of observationsas reported in results in Fryer (2019) (Appendix Tables 3A-3G in original). The second columnreports the number of observations we recover when using the coding procedure in Fryer (2019)which drops observations where some level of force was used that was below a given threshold.The third column displays the number of observations we recover when using our correctedcoding procedure, which codes outcomes as a 1 if a certain force threshold is reached and 0otherwise.

N (published) N (replicated) N (corrected coding)At least use of hands 4,927,962 4,980,701 4,980,701At least push to wall 4,152,918 4,245,091 4,980,701

At least use of handcu�s 4,017,783 4,122,329 4,980,701At least draw weapon 3,957,687 3,965,721 4,980,701

At least push to ground 3,950,324 3,958,374 4,980,701At least point weapon 3,918,741 3,926,805 4,980,701

Use baton or pepper spray 3,900,977 3,909,064 4,980,701

18

Page 39: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Figure B1: Replication of Fryer (2019) using various outcome coding rules. The �guredisplays odds ratios generated by OLS regressions that show the e�ect of suspect race withoutcovariates on the use of force across all force types generated using three approaches: the pub-lished OLS results from the Appendix of Fryer (2019) (black points and bars), our best attempt atreplication of these results (blue points and bars), and results using our corrected outcome codingscheme (red points and bars). Revising the coding scheme so as to retain data on sub-thresholduses of force generally de�ates estimated treatment e�ects.

● ● ● ● ● ●

At leastuse of hands

At leastpush to wall

At leastuse of handcuffs

At leastdraw weapon

At leastpush to ground

At leastpoint weapon

Use baton orpepper spray

0.00

0.05

0.10

0.15

0.20

0.25

0.0

0.5

1.0

1.5

0

1

2

3

4

0

1

2

3

4

5

−5

0

5

10

−5

0

5

10

15

20

0

25

50

75

100

Rac

ial d

iffer

ence

in u

se o

f for

ce(p

er th

ousa

nd e

ncou

nter

s)

●●Fryer (2019) Replication Corrected coding

Black civilians, versus white

● ● ● ●●

At leastuse of hands

At leastpush to wall

At leastuse of handcuffs

At leastdraw weapon

At leastpush to ground

At leastpoint weapon

Use baton orpepper spray

−0.10

−0.05

0.00

0.05

−0.8

−0.4

0.0

0.4

0.8

0

1

2

3

−1

0

1

2

3

−5

0

5

10

0

5

10

15

20

0

25

50

75

100

Rac

ial d

iffer

ence

in u

se o

f for

ce(p

er th

ousa

nd e

ncou

nter

s)

●Fryer (2019) Replication Corrected coding

Hispanic civilians, versus white

19

Page 40: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

B.2 Varying levels of force

20

Page 41: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Figure B2: Corrected ATEM=1 and ATTM=1 for encounters with Black and white civilians,varying levels of force. This �gure shows bounded e�ects comparing predicted levels of forcewhen setting suspect race for all observations to black vs. white. These estimates use our cor-rected coding scheme for dependent variables (as described above). Results from regressionswithout covariates appear in the top panels and results from models with a full set of covariatesappear in bottom panels.

●●

●●

●●

●●

●●

●●

At least push to ground At least point weapon Baton or pepper spray

At least push to wall At least handcuffs At least draw weapon

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.80

20

40

60

0.0

0.5

1.0

1.5

2.0

050

100150200

0

5

10

15

20

0

100

200

300

0

20

40

Proportion of discriminatory stops

Civ

ilian

s su

bjec

t to

any

raci

ally

disc

rimin

ator

y us

e of

forc

e (t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

●●

●●

●●

●●

●●

●●

At least push to ground At least point weapon Baton or pepper spray

At least push to wall At least handcuffs At least draw weapon

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.80

20

40

60

0.0

0.5

1.0

1.5

2.0

0

50

100

150

200

0

5

10

15

20

0

100

200

300

01020304050

Proportion of discriminatory stops

Civ

ilian

s su

bjec

t to

any

raci

ally

disc

rimin

ator

y us

e of

forc

e (t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

21

Page 42: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Figure B3: Corrected ATEM=1 and ATTM=1 for encounters with Hispanic and white civil-ians, varying levels of force. This �gure shows bounded e�ects comparing predicted levels offorce when setting suspect race for all observations to Hispanic vs. white. These estimates use ourcorrected coding scheme for dependent variables (as described above). Results from regressionswithout covariates appear in the top panels and results from models with a full set of covariatesappear in bottom panels.

●●

●●

●●

●●

●●

●●

At least push to ground At least point weapon Baton or pepper spray

At least push to wall At least handcuffs At least draw weapon

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

0

20

40

−0.5

0.0

0.5

1.0

1.5

050

100150200

0

5

10

15

0

100

200

300

01020304050

Proportion of discriminatory stops

Civ

ilian

s su

bjec

t to

any

raci

ally

disc

rimin

ator

y us

e of

forc

e (t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

●●

●●

●●

●●

●●

●●

At least push to ground At least point weapon Baton or pepper spray

At least push to wall At least handcuffs At least draw weapon

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

01020304050

−0.5

0.0

0.5

1.0

0

50

100

150

200

0

5

10

15

0

100

200

300

0

10

20

30

40

Proportion of discriminatory stops

Civ

ilian

s su

bjec

t to

any

raci

ally

disc

rimin

ator

y us

e of

forc

e (t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

22

Page 43: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

B.3 Excluding drug stops

23

Page 44: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Figure B4: Bounds on race e�ect excluding drug stops. This analysis replicates the analysisin Figure 4 in the main text excluding stops that were motivated by suspicion of a drug transac-tion, as such instances may violate the mediator monotonicity assumption. The results remainsubstantively similar.

●●

GF

K (

2007

)

●●

GF

K (

2007

)

Black civilians Hispanic civilians

Baseline specification

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.000

500

1000

Proportion of racially discriminatory stops

Civ

ilian

s su

bjec

t to

raci

ally

disc

rimin

ator

y us

e of

forc

e(t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

24

Page 45: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

B.4 Analysis of two races at a time

25

Page 46: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

Figure B5: Bounds on race e�ect limiting analysis to two racial groups of suspects. Plotsin the main text estimated bounds using data on multiple racial groups of suspects by predictingcounterfactual values for every observation, regardless of a suspect’s actual race, after modelparameters were estimated. These �gures reproduce the same analysis using only data on thetwo racial groups being compared, and exclude data on suspects who were not black, Hispanicor white entirely.

●●

GF

K (

2007

)

Black civilians

0.00 0.25 0.50 0.75 1.000

300

600

900

Proportion of racially discriminatory stops

Civ

ilian

s su

bjec

t to

raci

ally

disc

rimin

ator

y us

e of

forc

e(t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

●●

GF

K (

2007

)

Hispanic civilians

0.00 0.25 0.50 0.75 1.000

200

400

600

Proportion of racially discriminatory stops

Civ

ilian

s su

bjec

t to

raci

ally

disc

rimin

ator

y us

e of

forc

e(t

hous

ands

)

naïve ATEM=1 × #{stopped} ATTM=1 × #{stopped minorities} ATEM=1 × #{stopped}

26

Page 47: Administrative Records Mask Racially Biased Policing · 2020. 8. 29. · this context, and carefully grounding unobservable parameters—in particular, the proportion of racially

References

Ayres, Ian. 2002. “Outcome Tests of Racial Disparities in Police Practices.” Justice Research andPolicy 4(1-2):131–142.

Becker, Gary. 1971. The Economics of Discrimination. University of Chicago Press.

Engel, Robin. 2008. “A Critique of the “Outcome Test” in Racial Pro�ling Research.” Justice Quar-terly 25(1):1–36.

Fryer, Roland G. 2019. “An Empirical Analysis of Racial Di�erences in Police Use of Force.” Journalof Political Economy 127(3):1210–1261.

Goel, Sharad, Justin M. Rao and Ravi Shro�. 2016. “Precinct or Prejudice? Understanding RacialDisparities in New York City’s Stop-And-Frisk Policy.” Annals of Applied Statistics 10(1):365–394.

Horowitz, Joel L. and Charles F. Manski. 2000. “Nonparametric Analysis of Randomized Experi-ments With Missing Covariate and Outcome Data.” Journal of the Americal Statistical Associa-tion 95(449):77–84.

Knowles, J., N. Perisco and P. Todd. 2001. “Racial bias in motor vehicle searches: Theory andevidence.” Journal of Political Economy 109(1):203–229.

Knox, Dean, Teppei Yamamoto, Matthew A. Baum and Adam J. Berinsky. 2019. “Design, Identi�-cation, and Sensitivity Analysis for Patient Preference Trials.” Journal of the American StatisticalAssociation 00(0):1–15.

Ridgeway, Greg and John MacDonald. 2010. Race, Ethnicity, and Policing: New and EssentialReadings. NYU Press chapter Methods for Assessing Racially Biased Policing.

Robins, J.M. and S. Greenland. 1992. “Identi�ability and exchangeability for direct and indirecte�ects.” Epidemiology 3(2):143–155.

Simoiu, Camelia, Sam Corbett-Davies and Sharad Goel. 2017. “The problem of infra-marginalityin outcome tests for discrimination.” The Annals of Applied Statistics 11(3):1193–1216. https://arxiv.org/abs/1706.05678.

27