spatial proximity moderates genotype uncertainty in genetic tagging … · 2020. 1. 1. · genetic...

128
Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging Studies Ben C. Augustine a J. Andrew Royle b Daniel W. Linden c Angela K. Fuller d a Atkinson Center for a Sustainable Future and Department of Natural Resources, Cornell University, Ithaca, NY 14843 b U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD, 20708 c NOAA National Marine Fisheries Service, Gloucester, MA, 01930 d U.S. Geological Survey, New York Cooperative Fish and Wildlife Research Unit, Department of Natural Resources, Cornell University, Ithaca, NY 14843 Corresponding email: [email protected] January 1, 2020 1 . CC-BY-NC-ND 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463 doi: bioRxiv preprint

Upload: others

Post on 20-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Spatial Proximity Moderates Genotype Uncertainty inGenetic Tagging Studies

Ben C. Augustinea

J. Andrew Royleb

Daniel W. Linden c

Angela K. Fuller d

a Atkinson Center for a Sustainable Future and Department of Natural Resources,Cornell University, Ithaca, NY 14843

b U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel, MD, 20708c NOAA National Marine Fisheries Service, Gloucester, MA, 01930

d U.S. Geological Survey,New York Cooperative Fish and Wildlife Research Unit,

Department of Natural Resources, Cornell University, Ithaca, NY 14843Corresponding email: [email protected]

January 1, 2020

1

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 2: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

KEYWORDS: Spatial capture-recapture, partial identity, spatial partial identity, genetic capture-recapture, microsatellite, capture-recapture

Abstract

Accelerating declines of an increasing number of animal populations worldwide necessitate meth-

ods to reliably and efficiently estimate demographic parameters such as population density and

trajectory. Standard methods for estimating demographic parameters from noninvasive genetic

samples are inefficient because lower quality samples cannot be used, and they do not allow for er-

rors in individual identification. We introduce the Genotype Spatial Partial Identity Model (SPIM),

which integrates a genetic classification model with a spatial population model to combine both

spatial and genetic information, thus reducing genotype uncertainty and increasing the precision

of demographic parameter estimates. We apply this model to data from a study of fishers (Pekania

pennanti) in which 37% of samples were originally discarded because of uncertainty in individual

identity. The Genotype SPIM density estimate using all collected samples was 25% more precise

than the original density estimate, and the model identified and corrected 3 errors in the original

individual identity assignments. A simulation study demonstrated that our model increased the

accuracy and precision of density estimates 63% and 42%, respectively, using 3 PCRs per genetic

sample. Further, the simulations showed that the Genotype SPIM model parameters are identifi-

able with only one PCR per sample, and that accuracy and precision are relatively insensitive to

the number of PCRs for high quality samples. Current genotyping protocols devote the majority of

resources to replicating and confirming high quality samples, but when using the Genotype SPIM,

genotyping protocols could be more efficient by devoting more resources to low quality samples.

Significance

We present a new statistical framework for the estimation of animal demographic parameters, such

as abundance, density, and growth rate, from noninvasive genetic samples (e.g., hair, scat). By in-

tegrating a genetic classification model with a spatial population model, we show that accounting

for spatial proximity of samples reduces genotype uncertainty and improves parameter estimation.

Our method produces a fundamentally different approach to genetic capture-recapture by sharing

2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 3: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

information between the normally disjunct steps of assigning individual identities to genetic sam-

ples and modeling population processes. Further, it leads to more efficient protocols for processing

genetic samples, which can lower project costs and expand opportunities for applying noninvasive

genetics to conservation and management problems.

Introduction

Species extinction risk is tied to the loss of individual populations, with recent studies demon-

strating range contractions of 94-99% in some of the world’s large carnivores (Wolf and Ripple

2017). Accelerating declines of an increasing number of animal populations worldwide necessi-

tate methods to reliably and efficiently estimate demographic parameters such as population size,

and vital rates such as survival probability, recruitment rate, and the population trajectory through

time. Unfortunately, many species of conservation concern are managed without having the nec-

essary information on population status or trends, which is largely a consequence of the cost and

difficulty of studying species in decline and the difficulty of applying statistical models to sparse

data, which can produce imprecise and biased estimates of demographic parameters.

Noninvasive genetic monitoring has become an invaluable tool for estimating population pa-

rameters and quantifying population status because genetic samples are efficient to collect for a

large number of species (Lamb et al. 2019). The DNA contained in noninvasive samples, such as

tissue, hair, or scat, can be used to extract microsatellite markers (Taberlet et al. 1996), or more

recently, single nucleotide polymorphisms (SNPs; Natesh et al. 2019), which serve as the basis

for estimating population genetics or population dynamics parameters. The role of genetic mark-

ers in genetic capture-recapture studies is to provide individual identities for the collected samples,

which are then used to construct capture histories required by capture-recapture models, or more

recently, spatial capture-recapture models (SCR; Borchers and Efford 2008; Royle et al. 2013). In-

dividual identities are constructed by observing the combination of allele values at enough genetic

loci that it is improbable that multiple individuals in the collected sample share the same multilocus

genotype (the “shadow effect”; Taberlet et al. 1996).

Two key challenges in the application of noninvasive genetic sampling are that genetic markers

3

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 4: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

from noninvasive samples are observed with error and the markers may provide insufficient power

to discriminate between individuals, particularly when not all loci amplify successfully. These

challenges are especially problematic when using observed multilocus genotypes to establish in-

dividual identities for the use in capture-recapture models because errors in assigning individual

identities can severely bias parameter estimates(Mills et al. 2000; Creel et al. 2003; Lukacs and

Burnham 2005). The most widely applied solution to these problems involves a process of data

curation and filtering aimed at identifying only the highest certainty samples and discarding the

remainder (Lampa et al. 2013; Sethi et al. 2014).

The genotyping process involves (1) DNA extraction and amplification via polymerase chain

reaction (PCR) (Waits and Paetkau 2005), and (2) decision making and analysis by an expert

to interpret genetic samples and assign them multilocus genotypes and individual identities. In

practice, these are regarded to be error-free, reorganized into capture histories, and then used in

capture-recapture models to estimate animal demographic parameters. In practice, DNA extraction

and amplification are prone to errors, especially for noninvasive samples that typically contain a

low quantity and quality of DNA (Taberlet et al. 1996; Lampa et al. 2013). Both shadow effects and

genotyping errors lead to incorrect assignment of individual identities to samples, and error rates

as low as 1-5% introduce strong bias into population parameter estimates using typical capture-

recapture models (Mills et al. 2000; Creel et al. 2003; Lukacs and Burnham 2005) that strictly

assume that all samples are identified to individual correctly (Otis et al. 1978). In addition, because

only a single capture history is produced, based on the “consensus genotype” of each sample,

uncertainty inherent to the decision making process of assigning individual identities to samples is

not propagated through to the inferences.

The extreme sensitivity of capture-recapture models to even small error rates in individual

identity (Mills et al. 2000; Lukacs and Burnham 2005) has largely determined the structure of

genotyping protocols used for genetic capture-recapture to date. Genetic markers were established

as a reliable tool for capture-recapture analyses by the development of standardized lab protocols

that minimized genotyping errors (Taberlet et al. 1996; Paetkau 2003; Waits and Paetkau 2005) and

statistical tools that aid in determining the number of required markers and identifying the reliable

samples (e.g., Waits et al. 2001; Miller et al. 2002). These tools were originally developed for

microsatellite markers, which we focus on here; however, much of our discussion also applies to

4

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 5: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

SNPs or other genetic markers. Genotyping protocols vary across studies, but common features

used to reduce errors in assigning individual identities are 1) the use of enough highly variable loci

to minimize shadow events, 2) some form of replicate genotyping to identify and limit genotyping

errors, and 3) the removal of samples judged to be unreliable. We will briefly describe this “tra-

ditional approach” to assigning individual identities to samples–see Lampa et al. (2013) and Sethi

et al. (2014) for comprehensive reviews.

First, shadow events are minimized by using a marker set with sufficient discriminatory power.

The main statistic used to measure the discriminatory power of a marker set is PID, which quantifies

the probability that two randomly selected individuals in the population will have the same geno-

type by chance, given the number of loci and estimated allele frequencies (Paetkau et al. 1998). In

practice, the more conservative PIDsib, the probability two randomly selected siblings in the popu-

lation will have the same genotype by chance (Waits et al. 2001), is typically used. For simplicity,

we will refer to both of these statistics as PID. Generally, a low PID threshold is used to determine

how many markers to use, and this threshold varies widely across studies–Lampa et al. (2013)

documented studies with PID thresholds spanning 7 orders of magnitude (8.2 x 10−4 to 2.7−11).

One factor partially accounting for this variability is that, in order to limit the absolute number

of shadow events, the PID threshold must scale with the number of individuals captured (Paetkau

2003), a product of the population size and individual capture probability. Unfortunately, the pop-

ulation size, capture probability, and number of individuals captured are quantities to be estimated

and are by definition unknown, or imprecisely known, in advance. Given this and other sources

of uncertainty, PID thresholds are typically chosen to be overly conservative, which can lead to the

culling of large numbers of samples that do not amplify at all loci or otherwise cannot be scored at

all loci due to potential genotyping errors. We refer to these samples as partial genotypes.

Genotyping errors are then minimized by using some form of replication of the genotyping

process from which consensus genotypes are constructed. Generally, the most rigorous (and ex-

pensive) method for generating consensus genotypes from low quality DNA samples is the “multi-

tubes” approach (Taberlet et al. 1996; Miller et al. 2002) where the DNA product from each sample

is split across multiple subsamples and then amplified and scored independently. The consensus

genotype is then determined by comparing the scores across replications and only scoring a locus

if the same single-locus genotype is seen a minimum number of times across replicates (Taberlet

5

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 6: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

et al. 1996). This minimum for reaching consensus varies somewhat subjectively across studies

and by zygosity (homozygotes vs. heterozygotes; Lampa et al. 2013). Less comprehensive and

more efficient protocols for generating consensus genotypes are also in use, where only samples

suspected of containing errors are replicated (Paetkau 2003; Schwartz et al. 2006). After consensus

genotypes are generated, samples are matched to individual using one of a number of algorithms

(e.g., Creel et al. 2003; Macbeth et al. 2011), while discarding samples whose consensus genotypes

are missing too many locus scores due to failed amplification or insufficient genotype confirmation.

This typical approach to producing individual identities from replicated genotype scores (the

‘individual identity observation process’) can be conceptualized as a random thinning process (de-

picted in Figure I) where the true capture history, Y true is split into a capture history of known

identity samples, Y ID, and a vector of trap-level or a matrix of trap by occasion-level counts,

Y unk, which is discarded. A possible thinning process for an individual by trap capture history is

yIDi j ∼ Binomial(ytrue

i j ,θ), where the θ parameter determines the probability that a sample can be

identified to individual. θ is then a function of the overall quantity and quality of DNA in the sam-

ples, but also, the level of conservativeness used for accepting samples as reliable. For the same

set of samples, a more conservative genotyping protocol will raise θ , leading to fewer individual

identity errors in Y ID at the cost of discarding more samples. This trade-off cannot be avoided if

no errors are allowed in individual identity. One further thing to note is that the individual iden-

tity observation process is only partially connected to the ecological and capture processes–the

information in the data associated with density, the detection parameters, and the spatial location

of samples is not used to assign individual identities to the samples, despite the fact that these

information sources contain abundant information about the true genotypes and their individual

identities (Augustine et al. 2019). In fact, the spatial location where a sample was collected consti-

tutes a continuous partial identity, observed with error, where the true state for each individual is

its center of activity during a survey and the dispersion of samples around an individual’s activity

center is a function of its home range size and thus, spatial scale of detection (Augustine et al.

2019). Therefore, the spatial locations where samples are collected are analogous to an additional

genetic marker, but to date, this information has not been used as part of the genotyping process.

All of these limitations can be resolved with an appropriately structured capture-recapture model

that allows for errors and uncertainty in individual identity. Recently, capture-recapture models

6

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 7: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

that allow for various types of errors and uncertainty in individual identity have been developed

(Lukacs and Burnham 2005; Link et al. 2010; Wright et al. 2009; Augustine et al. 2019; Knapp

et al. 2009), though to date, no comprehensive model exists that includes all the features relevant

to the genotyping process.

Here, we present the Genotype Spatial Partial Identity Model (SPIM)–a single probabilistic

framework which removes the process of subjective processing and interpretation of genetic data

from the “traditional genotyping process”. Our approach combines an explicit model of genetic and

individual classification with a model of spatially-explicit individual encounter histories for making

inference about animal population parameters. This model, with clearly articulated probability

assumptions about each component of the system, allows for uncertainty to be propagated among

the ecological, capture, and genotyping processes, uses all available sources of information, and

removes the need for data culling. The net result is increased efficiency in noninvasive genetic

capture-recapture studies by making use of all available data and the concomitant improvement

in statistical accuracy and precision of estimates of population parameters. Our model accounts

for the shadow effect and genotyping errors–both allelic dropout and false alleles. Further, we

use the spatial locations where samples were collected to reduce uncertainty in probabilistically

assigning their individual identities. We apply this model to a previously analyzed data set of

fishers (Pekania pennanti) in New York, USA (Linden et al. 2017), making use of the 37% of

samples that were originally discarded due to uncertain individual identity (Figure 1). We then

investigate the performance of the model via simulation.

Methods – Model Description

The Genotype SPIM is a 3-level ecological hierarchical model (Royle and Dorazio 2008, see Fig-

ure I), with the first two levels being models for the ecological and capture processes of the same

general structure as the catSPIM. We use a joint ecological process model, the first component of

which describes the number and distribution of individuals across a two-dimensional state space,

S , by use of a spatial point process in which realized point locations SN×2 represent each individ-

ual’s mean location (activity center) during the survey. We assume activity centers are distributed

uniformly – si ∼ Uniform(S ), i = 1, . . . ,N, though other models could be used. The second pro-

7

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 8: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

cess model component describes each individual’s multilocus genotype, which we define to be an

individual’s true values at ncat genetic loci. Each loci l has nlevelsl possible single locus genotypes,

l = 1, . . . ,ncat , which are enumerated 1, . . . ,nlevelsl for each l. Associated with each loci l are single

locus genotype frequencies γl–the probabilities with which each single locus genotype occurs in

the population. The multilocus genotypes of all individuals are organized into the N×ncat matrix

Gtrue, with rows ofGtrue corresponding to the same individuals as the rows of S. We assume each

element of Gtrue is independent of the others: gtrueil ∼ Categorical(γl). Together, these models

for the activity centers and genotypes provides a spatially-explicit description for the distribution

of genotypes across space. Note that this genotype distribution model allows multiple individuals

in the population to have the same multilocus genotype (shadow effect), but they will not share a

spatial location.

The capture process model also has two components. The first component describes how

animals are detected during K capture occasions, conditional on their activity center locations.

We assume individuals are detected at specific point locations in the state space (i.e. “traps”),

XJ×2 = {x j; j = 1,2, . . . ,J}. The capture data are organized in Y true, recording the number of

counts for each individual at each trap summed across capture occasions. This data structure is

given a superscript “true” to distinguish it from the observed capture data, described below–in the

presence of genotyping error and/or shadow events, the true encounter histories are latent variables.

We assume the individual by trap detection data arise following a detection function, describing

the trap by occasion-level detection probability or rate as a function of distance from an individual

activity center. We further assume the individual by trap by occasion detection process is Poisson,

where the baseline detection rate of individual i at trap j is λ (si,x j) = λ0 exp(− ||si−x j||2

2σ2

), and

ytruei j ∼ Pois(Kλ (si,x j)). The observed capture data are defined to be Y obs, an nobs× J matrix,

where nobs is the sum of all observed counts. Each row of Y obs corresponds to one count member

(e.g., a count of 3 has 3 count members that cannot be deterministically linked when individual

identity is unknown), with all zero entries, except for a single 1 indicating the trap of capture.

The second component of the capture process model describes how the genetic loci are observed

or “captured” conditional on the true multilocus genotypes. In a model without observation error

(e.g., catSPIM), the observed loci reflect their true values or are recorded as missing data.

We now consider that the loci-level genotypes may not always be observed correctly using an

8

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 9: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

explicit genotype observation process (see Figure I). Let nrep be the maximum number of times

each of the nobs samples are observed (e.g., number of replicate PCRs), with the observed scores

recorded in the nobs×ncat ×nrep array Gobs for the loci that amplify and recorded as missing data

for loci that do not amplify or for replication numbers of samples with fewer than the maximum

number of replicates. We allow for 3 observation events–correct observation, allelic dropout (het-

erozygote observed as homozygote), and false allele (any other error). We use a simple model for

these genotype observation events that assumes that each possible allelic dropout event is equally

likely (previously assumed by Wright et al. 2009; Sethi et al. 2016), each possible false allele event

is equally likely (previously assumed by Sethi et al. 2016), and the allelic dropout and false allele

probabilities do not vary across sample (relaxed below), locus, individual, or replicate number.

We define phom and phet to be vectors of the observation probabilities for homozygous and het-

erozygous loci-level genotypes, respectively. Then, phom = (phomC , phom

FA ) for homozygous correct

and false allele observation and phet = (phetC , phet

AD, phetFA) for heterozygous correct, allelic dropout,

and false allele observation. These observation probabilities then make up the elements of pil ,the

observation probability matrix for locus l, conditional on the true locus-level genotypes. Details of

how these matrices are constructed along with other technical details of the model can be found in

Appendix A.

Methods – Fisher Application

We applied the Genotype SPIM to a hair snare data set from fisher surveyed in New York, USA,

collected in 2014. The full details of this study can be found in Linden et al. (2017). Four hundred

and twenty hair samples were collected at 608 traps, each operated for 3 1-week occasions. Each

sample was amplified 1 to 7 times across 9 microsatellite loci. In the original study, 263 samples

were assigned individual identities using the methods of Creel et al. (2003) with a PID−sib criterion

of 0.005, though some additional matches were made using the methods of Macbeth et al. (2011).

The 263 samples were assigned to 189 distinct individuals. One hundred fifty-seven samples (37%)

were discarded because they produced partial genotypes not informative enough to be confidently

assigned individual identities (105 samples) or they did not amplify at all (52 samples). Among the

individually identified samples in the original study, there were 74 recaptures from 50 individuals

9

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 10: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

and 9 spatial recaptures across 8 individuals. The low rate of spatial recaptures was largely due to

a survey that was primarily designed for an occupancy analysis aiming for independent trap sites.

We applied the Genotype SPIM to all 420 collected samples and compared it to the estimates

from the regular SCR analysis using the 263 samples originally assigned certain individual iden-

tities using traditional methods. For both the Genotype SPIM and SCR analyses, we used the

individual heterogeneity model for the detection function parameters described in Appendix A,

because the distribution of spatial recapture numbers and distances suggested strong individual

heterogeneity in space use. We allowed genotyping error rates to vary by two sample quality cat-

egories as described in Appendix A. We defined “high quality” samples to be those that amplified

at an average of ≥8 of 9 loci across the first 3 replication attempts and “low quality” samples as

those amplifying at an average of <8 loci across the same 3 replication attempts. This criterion is

somewhat arbitrary, but it is more realistic than assuming all samples have the same genotyping

error probabilities and is consistent with the fact that samples with less DNA product have both

higher genotyping error rates and lower loci-level amplification rates (Taberlet et al. 1996; McK-

elvey and Schwartz 2004). To compare the precision of selected parameter estimates between the

Genotype SPIM and SCR analyses, we used the coefficient of variation (CV)–the posterior stan-

dard deviation divided by the posterior mode. See Appendix B for the MCMC specifications for

this analysis.

Methods – Simulation Study

We conducted a small simulation study motivated by the fisher analysis with a large proportion

of originally discarded samples to demonstrate the performance of the Genotype SPIM in general

(e.g., bias and coverage) and to compare 1) the regular SCR estimate using only high quality

samples to the Genotype SPIM estimates using all samples, 2) the Genotype SPIM estimates with

1-3 PCRs per sample and 3) the Genotype SPIM estimates using or discarding the low quality

samples. The simulation study was designed to replicate the design that produced the fisher data

set with a few caveats. Specifically, we did not consider the individual heterogeneity model for the

detection function parameters to reduce computation time, we used a smaller trapping array with

traps spaced optimally for SCR to better reflect the typical resources available for survey effort, and

10

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 11: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

we considered a higher population density so that the scenario is more challenging for uncertain

identity methods (more home range overlap; Augustine et al. 2019). See Appendix C for the full

simulation study specifications.

Results

Results – Fisher Application

The Genotype SPIM analysis produced abundance and density estimates that were 25% more pre-

cise than the SCR estimate as judged by the coefficient of variation (Table 1). The increased

precision was largely driven by an increase in individual detectability and more precise spatial

scale parameter estimates when including the 157 samples originally discarded. The overall detec-

tion parameter (a0) point estimate increased 44% from 3.28 to 4.73 and the number of individuals

detected was estimated at 272, a 45% increase over the 187 detected individuals in the original,

curated data set. The Genotype SPIM abundance point estimate was 15% higher than the SCR es-

timates; however, given the level of uncertainty in both estimates, there is no indication that these

two estimators would differ on average in their point estimates. The certain identity assignments

made by the Genotype SPIM (posterior match probability of 1) corresponded to those made in the

original study except for 5 cases described in Supplement 1. Two of these 5 cases were examples

where the Genotype SPIM assignment implied that the geneticist was too confident in probably

correct assignments (assigned a match when the Genotype SPIM estimated match probability was

¡1, but ¿ 0.75), but in 2 cases, the Genotype SPIM assignments implied the geneticist assigned dif-

ferent individual identities to samples that were actually from the same individual with probability

1 and 0.92, and in one case, the Genotype SPIM assignment implied the geneticist incorrectly

assigned 2 samples to the same individual with probability 1. Spatially-explicit depictions of the

posterior identity matches can be visualized for every sample using code provided in the Data

Supplement, with a subset of these matches illustrated in Supplement 1.

The detection function spatial scale point estimates, σ and σ sd , were roughly similar between

the Genotype SPIM and SCR models (Table 3, Figure 2), but these parameter estimates were

more precise for the Genotype SPIM. This gain in precision is likely due to the greater number

11

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 12: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

of spatial recaptures (individuals captured in more than 1 locations), especially high probability

spatial recaptures, contained in the partial genotype samples. The 105 partial genotype samples

effectively doubled the number of certain spatial recaptures as seen through the posterior for the

number of spatial recaptures (Figure 2), which takes a minimum value of 14, compared to the 9

used in the original data set and SCR analysis. The posterior mode for spatial recaptures (Figure 2)

was 24, with a 95% HPD interval of (18 - 31), indicating a high probability that there were more

than 2 times as many spatial recaptures than included in the original data set. Among the 157

samples originally discarded, the Genotype SPIM matched 6 to another sample with probability

greater than 0.99, 17 with probability greater than 0.9, and 30 with probability greater than 0.75.

For these same discarded samples, the Genotype SPIM assigned 11 samples to unique individuals

(individuals with 1 capture event) with a probability greater than 0.99, 31 with a probability greater

than 0.9, and 49 with a probability greater than 0.75.

The genotype observation probabilities differed between sample types (high vs. low quality;

Table 2). Because allelic dropout can only occur for heterozygous genotypes and false allele rates

were very low, homozygous single-locus genotypes were estimated to be almost always scored

correctly for both sample types (>0.994). The major difference in reliability between sample

types was the probability of an allelic dropout observation, which was roughly 1.7 times more

likely for low quality samples (0.185 vs 0.496). Despite the general unreliability of the poor quality

samples, the overall improvement in the precision of the abundance estimate stemming from their

use demonstrates that they still contain substantial information about the population parameters of

interest. The single-locus genotype frequency estimates can be found in Supplement 1.

Results – Simulation Study

The Genotype SPIM abundance estimates were approximately unbiased (≈-1%) with near nominal

coverage (Table 3) except when using only 1 genotype assignment (e.g., PCR). In this case, bias

was -3.6% when including the low quality samples and -2.1% when only using the high quality

samples. The 95% coverage for abundance was only less than nominal in the scenario including

the low quality samples with only 1 genotype assignment, where it was 0.91. The Genotype SPIM

including the low quality samples (37% of total samples) with 3 replicated assignments was 42%

12

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 13: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

more precise, as judged by the mean 95% CI width, and 63% more accurate, as judged by the

mean squared error, than the SCR estimator that did not use the low quality samples. With only 2

replicated assignments, the Genotype SPIM was 41% more precise and 59% more accurate than

the SCR estimator. With just 1 assignment, the Genotype SPIM was 35% more precise and 39%

more accurate than the SCR estimator. By including the low quality samples, 27.4 more individuals

were captured, on average, representing an increase of 31% over the number captured in the high

quality samples alone. The uncertainty in the number of individuals captured, ncap, came almost

entirely from the low quality samples except when using only 1 genotype assignment. The mean

95% CI width for ncap when not using low quality samples was effectively 0 when using 2 and 3

replicated assignments (Scenarios SPIM2B and SPIM3B) and the posterior modes of ncap matched

the true value exactly 98% and 99% of the time, respectively. Thus, the individual identities of all

high quality samples were assigned correctly with probability 1 nearly 100% of the time, except

when there was only 1 genotype assignment, where nearly all individuals were assigned correctly

with probability 1, on average.

The genotype observation probability estimates (correct assignment, allelic dropout, false al-

lele) for heterozygous and homozygous true genotypes were approximately unbiased when using

more than 1 genotype assignment, except when low quality samples were included, where they

were approximately unbiased when using 3 replicated assignments (Table 4). In scenarios with

bias, the allelic dropout probability (phetAD) and false allele probabilities (phet

FA and phomFA ) estimates

were positively biased, with a corresponding negative bias in the correct observation probability

estimates (phetC and phom

C ). There was more bias in the genotyping error probabilities for the low

quality samples; however, there was less overall bias in the genotyping error probabilities for the

high quality samples when including the low quality samples and using only 1 genotype assign-

ment (Scenarios SPIM1A vs. SPIM 1B), likely due to the overall greater precision in the detection

and abundance parameters when including these samples. The estimates of phet and phom for

high quality samples with 1 genotype assignment were 1 - 6% more precise (depending on the

parameter), as judged by the posterior standard deviation when including the low quality samples

compared to when they were excluded. There was also some precision gain in the estimates of phet

and phom by including the low quality samples with 2 replicated assignments, but they were less

pronounced (1-2% precision gain). Precision gains in the estimates of phet and phom including the

13

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 14: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

low quality samples with 1 genotype assignment were negligible (<1%).

The simulation results for ncap, the number of individuals captured, indicate some subopti-

mal performance of the Genotype SPIM when including the low quality samples (Table 3). In

these scenarios with low quality samples, the ncap estimates are slightly negatively biased, with

these effects increasing as the number of replicated assignments decreases. With only 1 genotype

assignment, 95% coverage of ncap was less than nominal (0.91). Inspection of these simulation

results revealed that low quality samples for individuals only captured once were rarely assigned

incorrectly to neighboring individuals that had similar genotypes at the loci observed in the low

quality samples. In cases where this happened, the low quality sample was incorrectly scored at

one or more loci in all replicated assignments where a score was made (e.g., 1 allelic dropout score

and 2 failed amplifications or 2 allelic dropout scores and 1 failed amplification). The observed

bias in ncap increased with a decreasing number of replicated assignments because the expected

number of loci with no score and the expected number of loci where the correct genotype was not

included in any replicated assignment increases with fewer replicated assignments. This source of

bias in ncap also likely explains the bias seen in the genotype observation probability estimates for

low quality samples with 1 PCR.

Discussion

We developed the Genotype SPIM, a unified probabilistic framework for the processes of deter-

mining the true genotypes of samples, matching samples to individuals, and estimating population

parameters using spatial capture-recapture. The Genotype SPIM recognizes that uncertainty in

the genetic classification and population estimation processes is sequentially connected and prop-

agates this uncertainty from one process to another, using a hierarchical model. The Genotype

SPIM allows for a fundamental shift in the use of genetic data in capture-recapture, eliminating

the need for decision rules that determine the (minimized) expected level of error in the individual

identity assignments and that do not propagate identity error probabilities to the population param-

eter estimates. Perhaps most importantly, the Genotype SPIM eliminates the need for data culling,

which can be extreme in many non-invasive data sets where DNA quality is typically poor, and

leverages the additional information to increase the precision and accuracy of population parame-

14

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 15: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

ter estimates. This is especially important in conservation applications of many species that are of

concern largely because of their extremely low population sizes.

Unlike all existing attempts to address genotype uncertainty in capture-recapture, the Geno-

type SPIM recognizes that ecological systems are spatially explicit and it uses the spatial location

where genetic samples were collected to reduce genotype uncertainty. Two key features of our

model that allow it to exploit the spatial information of genetic samples are a spatially explicit

model for the number and distribution of individuals and genotypes across the study area and

a spatially-constrained model for individual detection (see Figure I). This genotype-augmented

ecological and capture model provides the scaffolding that allows for the shadow effect to be effi-

ciently resolved (disallowed in the most similar nonspatial model of Wright et al. 2009) and which

formally links the ecological concepts of population density and home range size to the uncer-

tainty in assigning samples to individuals (Augustine et al. 2019). The Genotype SPIM recognizes

that samples collected closer together in space are more likely to come from the same individual,

so each sample carries information about the true genotype of their neighboring samples and the

genotyping errors that likely did or did not occur in these neighboring samples. This contrasts with

previous approaches for matching samples to individuals in the presence of genotyping errors that

assume the samples are independent of one another (e.g., Kalinowski et al. 2006; Macbeth et al.

2011; Sethi et al. 2016). The spatial locations where samples were collected reduce the uncertainty

in each captured individual’s estimated genotype, improve the estimation of genotype frequencies

and genotyping error rates, and improve the probabilistic assignment of samples to individuals.

The net result is improved estimates of population parameters. Posterior states for the probabilis-

tic identity assignments can be visualized for every sample, providing an understanding of how

the ecological, capture, and genotype observation models combine to produce the probabilistic

assignments (see examples in Supplement 1)

We believe it is this spatial structure of the Genotype SPIM that allows for the high proba-

bility identity assignments for many of the “low quality” samples in the Fisher application (see

Supplement 1), for the ability of the low quality samples to substantially improve inference in the

simulation study, and for the identifiability of the model parameters with only 1 genotype assign-

ment in the simulation study and supplementary application (see Appendix E). Unlike the most

similar nonspatial model of Wright et al. (2009), the Genotype SPIM is identifiable with no geno-

15

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 16: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

type information at all, at least when the level of uncertainty in individual identity is not too high

(Augustine et al. 2019), as it reduces to the model of Chandler and Royle (2013) for unmarked

SCR. Augustine et al. (2019) showed that introducing categorical identity covariates (e.g., geno-

types) known with certainty can greatly improve estimation over unmarked SCR and not much

categorical identity information is required to achieve certainty in individual identification when

individual space use is restricted relative to the complete extent of the population under study. The

Genotype SPIM builds on the model of Augustine et al. (2019) to allow for the genotypes to be

observed with error. Assuming that false allele rates will always be very low, the homozygous

single-locus genotypes will be recorded with near certainty, leaving just the heterozygous single-

locus genotypes for which there is nonnegligible uncertainty. Thus, the Genotype SPIM will match

many samples with high certainty that would not pass typical PID thresholds and it should match

samples with higher certainty than the nonspatial model of Wright et al. (2009) in many scenar-

ios. No simulation study of the Wright et al. (2009) model has been published, but we expect

low quality samples to be less useful in that model without their associated spatial information

and we expect that model to rely more heavily on the assumptions of the genotype distribution

and error models than the Genotype SPIM, which does not rely on these submodels for parameter

identifiability.

An alternative SPIM that allows all samples to be used without modeling the genotyping error

process is hinted at in Figure I in the individual identity observation process under the “typical

approach” to applying SCR to genetic samples. We conceptualized the process of assigning indi-

vidual identities to samples as a “random thinning” process (following others, see below), where

samples lose their individual identities at random with probability 1-θ . This produces two data

sets–one with individual identities, and one without. The data set with no individual identities

is typically discarded, which can be extremely detrimental in conservation decision making for

species that are most information poor and of high conservation concern. Alternatively, the un-

known identity samples could be used to probabilistically reconstruct the full data set without

consideration for their replicated genotype scores. This “random thinning” model has previously

been developed (Richard Chandler, pers. comm.), but not published, and is identical to the random

thinning process specified for marked individuals in spatial mark-resight by Jimenez et al. (2019).

While not requiring the replicated assignment data nor assumptions about the genotype distribu-

16

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 17: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

tions or the genotyping observation process, this approach has two disadvantages. First, it does not

incorporate any uncertainty in the individual identity assignments for the samples assigned individ-

ual identities–this data set may still contain errors as we found in the fisher application. Second, the

information contained in the replicated assignments increases the probability with which samples

can be assigned to individuals. Discarding this information can significantly reduce the informa-

tion about individual identity–the only information remaining is the spatial location. Thus, the

value of using the unidentified individual samples will decrease much more rapidly as population

density increases, compared to the full information samples.

While the genotyping error rates are effectively nuisance parameters whose estimation is re-

quired for correct inference about the population parameters of interest, the Genotype SPIM may

provide a more statistically powerful framework for exploring the mechanisms of genotyping error

in low quality DNA samples and quantifying their relative influence absent known identity, ref-

erence genotypes. As expected, we estimated that false allele events were very rare, with a per

locus per replicate error probability of approximately 0.01 (Table 2). Allelic dropout events were

estimated to be much more common, with probabilities of 0.19 and 0.50, for high and low quality

samples, respectively. Interestingly, we estimated that the false allele probability varied by zygos-

ity, with false alleles being more common for true heterozygous single-locus genotypes than true

homozygotes (Table 2). This difference was more pronounced for low quality samples (0.015 vs.

0.001) than for high quality samples (0.009 vs. 0.006). A possible explanation for this result is

that a major source of false alleles is allele-calling errors which disproportionately occur for true

heterozygotes (Johnson and Haydon 2007)–a false allele event is more likely when reading 2 allele

peaks for a heterozygote than when reading the single peak for a homozygote. One apparently

anomalous result is that the false allele probability for homozygous loci from low quality samples

is lower than that for high quality samples. A possible explanation for this result is that the ma-

jority of homozygous loci for low quality samples that would have produced a false allele if they

amplified failed to amplify. PCR artifacts are a source of false alleles for homozygotes (Johnson

and Haydon 2007), and the mechanisms leading to PCR artifacts may lead instead to failed amplifi-

cation more often in the lower quality samples. The genotyping error rates estimated by our model

are for the loci that amplify only. If there is a correlation between any of the genotyping error rates

and the amplification rates, the most error prone samples/loci will drop out disproportionately and

17

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 18: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

the genotyping error rates estimated for different sample qualities must be interpreted accordingly.

This estimation of error rates from censored data is not a problem for estimating the population

parameters in the presence of genotyping error, as only the error rates of the amplified samples/loci

are relevant.

We believe the Genotype SPIM will be relatively robust to misspecification of the genotyping

error model, especially in lower density populations, because there may only be a few possible

individuals with activity centers near any particular focal sample, and even fewer with similar

genotypes. This property should be investigated via simulation and through application to existing

data sets. Genotyping error rates may vary as a function of error type (allelic dropout vs. false

allele), zygosity (heterozygote vs. homozygote), sample quality, loci, replicate number, or indi-

vidual. Of these factors, we expect the largest differences to be between error types and sample

quality, both of which we accommodated in the Genotype SPIM. Error rates could be generalized

to vary by loci and replicate number, though we don’t see a plausible reason why they would vary

by individual. Paetkau (2003) identified that faulty lab procedures can lead to a lack of indepen-

dence between samples, and that sample quality can lead to a lack of independence across markers

for the same sample. The former source of dependence could be accommodated with replicate

covariates and the latter with sample type covariates. Instead of dividing samples into “low” and

“high” quality categories, one could use sample covariates that correlate with the amount and/or

quality of the DNA in each sample, if they are available. In the absence of these covariates, crude

sample type categories based on the amplification rates across replicated assignments should im-

prove inference.

Thousands of genetic capture-recapture data sets exist to which the Genotype SPIM could be

applied and the probabilistic individual identity assignments it makes can be compared with those

made using typical methods to identify possible inadequacies with the Genotype SPIM model

structure for any particular application. For the fisher application, the Genotype SPIM made all

the same identity assignments as the geneticist did for the samples originally assigned certain in-

dividual identities, except for 3 cases where it appears the Genotype SPIM identified erroneous

assignments made by the geneticist (see Supplement 1). This near perfect agreement in individual

identity assignments for the subset of samples assigned identities by the geneticist supports the

adequacy of the genotyping error model we used, though we cannot independently assess the per-

18

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 19: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

formance of the model for the lowest quality samples that were not originally assigned individual

identities.

Violations of the genotype distribution model are also possible (see Augustine et al. 2019). We

will discuss one way in which the possibility of genotyping error may exacerbate an assumption

violation for the genotype distribution over the model without genotyping error. We assume, as

is typical in methods for assigning individual identities to samples (e.g., Wright et al. 2009; Kali-

nowski et al. 2006; Macbeth et al. 2011), that the distribution of genotypes are independent across

individuals; though spatial genetic structure due to relatedness may be present due to philopatry

in some species. In this case, nearby genotypes may be more similar than expected under inde-

pendence. When genotyping error is allowed, it is possible that samples from two individuals with

very similar genotypes will erroneously be combined because they only differ at one or a few loci

where the differences could plausibly be due to genotyping errors. Therefore, spatially correlated

genotypes due to philopatry could introduce negative bias into the Genotype SPIM abundance and

density estimates, especially when the overall detection rate is lower and the proportion of samples

with sparse genotype information is higher.

Perhaps the most critical assumption of the Genotype SPIM is the proper specification of the

detection model. Specifically, individual heterogeneity in detection function parameters has been

identified as a challenge for SPIMs in general (Augustine et al. 2018b, 2019). We were able to ac-

commodate rather extreme individual heterogeneity in detection function parameters in the fisher

application using a modified version of the model of Efford and Mowat (2014), which specifies

a deterministic, negative relationship between σ0i and λ0i with a single individual random effect

on σ0i . However, this detection model was not sufficient for all long distance spatial recaptures to

be linked up during convergence using our default algorithm to initialize individual identities to

samples (see Appendix B) and required an informative prior for the σ standard deviation. Alter-

natively, individual heterogeneity in λ0 and σ with independent random effects on each parameter

could be considered, but this model is likely challenging to fit with typically sparse SCR data sets

and it would be more difficult to specify an informative prior on λ0 than it is for σ , which is related

to home range size. A second alternative model that may be useful for accommodating individual

heterogeneity in space use is that of Royle et al. (2016) where a subset of or all individuals are al-

lowed to have transient activity centers across sampling occasions. This model will not be helpful

19

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 20: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

if the very large spatial recaptures occur on the same capture occasion, which did occur for the 1

individual in the fisher data set that prevented convergence. Given the importance of individual

heterogeneity in detection function parameters to the performance of SPIMs, we recommend fur-

ther research and model development, though given the typical sparsity of individual recaptures

and spatial recaptures in SCR data sets, feasible models will need to be very simple and/or rely on

parameters for which ecologically-informed priors can be set (e.g. σ sd). An alternative solution

is to disallow the shadow effect for particular samples by deterministically linking certain long

distance spatial recaptures as can be done in the categorical spatial mark-resight model (Augustine

et al. 2018a).

While the Genotype SPIM can improve inference for data sets produced using the current geno-

typing protocols that seek to maximize certainty in individual identity for just a subset of samples,

it is possible that by relaxing the requirement that individual identities are assigned with certainty,

current genotyping protocols are no longer optimal. The simulation study demonstrates that the

amount of information obtained about individual identity from each replicate assignment declines

exponentially, while the costs increase linearly, so less replication in general may provide a better

compromise between project cost and the precision of population parameter estimates. Further,

it is possible that more replication of the low quality samples that are typically discarded as un-

reliable leads to larger improvements in population parameter estimates than replication of the

high quality samples which have been the focus to date. In the simulation study, adding replicate

assignments over the first assignment improved the abundance estimate precision and accuracy

very minimally when using only the high quality samples (Figure 3, Table 3). Including the low

quality samples improved both the precision and accuracy of the abundance estimate and adding

a second replicated assignment in this scenario reduced bias, substantially improved the accuracy,

and modestly improved precision. Thus, it is possible that current applications of the multi-tubes

approach may be allocating too much effort to replicating high quality samples, and protocols that

preclude the replication of poorly performing samples may be misallocating resources when the

Genotype SPIM can be used. Because the Genotype SPIM provides a single probabilistic frame-

work for genetic capture-recapture, different genotyping protocols can be evaluated via simulation

using clearly defined assumptions about the ecological, capture, and genotyping processes. The

minimum level of replication required in practice will depend on the specifics of the data set, but

20

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 21: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

see Appendix D for an application demonstrating that density can be estimated with only 1 PCR

per sample using a real data set.

The general structure of the Genotype SPIM should also be useful for other noninvasive obser-

vation methods such as remote cameras, to which machine learning algorithms for classification

are increasingly being applied (e.g., Arzoumanian et al. 2005; Norouzzadeh et al. 2018). Currently,

these machine learning algorithms assign individual identities using the photographs alone, with

no linkage to the ecological or capture processes, and typically require training data of known

identity individuals, which is often not available. Individual classification should be improved by

linking the process of identifying individuals to the ecological and capture processes, especially in

situations of lower signal to noise ratios, and this linkage facilitates the propagation of uncertainty

to the population parameters of interest. To date, the use of machine learning to produce individ-

ual identities from photographs has been mostly applied to problems with a high signal to noise

ratio, for example, spot patterns on animal flanks (e.g., Arzoumanian et al. 2005; Crall et al. 2013),

and the uncertainty in assigning the individual identities has not been propagated to the population

parameters of interest (but see Ellis 2018). Similarly, linking the species identification process to

the ecological and capture processes when using machine learning to produce species records for

occupancy studies should also improve inference. We provide a small simulation study and an

application of the general catSPIM with observation error model to an Andean bear data set in Ap-

pendix E, though we argue the model did not work well in this application due to a signal to noise

ratio that was too low. However, this example shows how an application might work in a situation

where more information about individual identity can be reliably extracted from photographs.

Noninvasive genetics has revolutionized the study of animal populations by capture-recapture,

allowing for the study of many species that could not have been studied effectively using conven-

tional methods based on physical capture or species that cannot be individually-identified from

camera traps. However, noninvasive studies often result in sparse data sets, that can lead to im-

precise population parameter estimates which are of limited use in conservation decisions because

they lead to an inability to discriminate between a population in need of conservation action and

one that is not. Biased estimates of population parameters can be even more problematic, poten-

tially falsely indicating that an imperiled population does not warrant conservation action, risking

extinction, or that a healthy population does warrant conservation action, leading to an ineffective

21

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 22: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

allocation of resources. Therefore, statistical methods that produce unbiased and precise popula-

tion parameter estimates are vital for making reliable, evidence based,conservation decisions. Our

genotype SPIM model links genetic classification with the ecological and capture process, and

accordingly results in increased accuracy and precision of density estimates with less bias and no

data loss, which can lead to more informed conservation decision making.

Acknowledgments

We would like thank Dana Morin and Chris Sutherland for contributing to model development and

Richard Chandler for many ideas relevant to updating latent individual identities in SCR MCMC

algorithms. Research funding was provided by Cornell University’s Atkinson Center for a Sus-

tainable Future (BA). Any use of trade, firm, or product names is for descriptive purposes only and

does not imply endorsement by the U.S. Government.

22

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 23: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Box 1

Figure I: A graphical depiction of the Genotype SPIM contrastedwith the typical approach to genetic capture-recapture

The Genotype SPIM is a3-level hierarchical model,with submodels for theecological, capture, andgenotype observation pro-cesses. The ecological pro-cess determines the abun-dance (N), density, andspatial locations (S) of theindividuals in the studyarea and associates a geno-type (Gtrue

i with each indi-vidual, which are governedby the loci-level genotypefrequencies, γ. Thesegenotypes may not be unique(shadow effect)–we depictunique genotypes with uniquecolors, with 2 individualssharing a “red” genotype.The capture process thendetermines where and howmany times each individualwill be captured, ytrue

i j , gov-erned by a detection func-tion between the locationof individual i and trap lo-cation, x j. When captured,individuals leave a recordof their genotype, not theirunique individual identity,due to the possibility of the shadow effect. The genotype observation process then determineswhich genotype we observe, Gobs

m , for sample m on replicate l, conditional on the true genotypeof the individual that was captured and the genotype observation probabilities in π . We gray outthe observed genotypes because the true genotypes are no longer observed perfectly, and indi-cate possible genotyping errors with red exes. The data for the Genotype SPIM are the spatiallyreferenced observed genotypes, which are used to probabilistically reconstruct Y true and incorpo-rate the uncertainty in individual identity into the population parameter estimates. To contrast theGenotype SPIM with the “typical approach”, we can conceptualize the individual identity obser-vation process as a random thinning process where the true capture history, Y true is split into acapture history of known identity samples, Y ID, and a vector of trap-level or a matrix of trap byoccasion-level counts, Y unk, which is discarded. A possible thinning process for an individual bytrap capture history is yID

i j ∼ Binomial(ytruei j ,θ), where the θ parameter determines the probability

23

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 24: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

that a sample can be identified to individual. θ is then a function of the overall quantity and qualityof DNA in the samples, but also, the level of conservativeness used for accepting samples as reli-able. For the same set of samples, a more conservative genotyping protocol will raise θ , leadingto fewer individual identity errors in Y ID at the cost of discarding more samples. This trade-offcannot be avoided if no errors are allowed in individual identity and we cannot guarantee that Y ID

has no errors.

24

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 25: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figures

Figure 1: A fisher at a baited hair snare.

Figure 2: SCR process and observation model posterior distributions from the SCR and SPIManalyses of the fisher data set. a0 is the overall detection parameter, σ is the population-leveldetection function spatial scale parameter in km, σ sd is the standard deviation of the individual-level variance in the spatial scale parameter, ncap is the number of individuals captured, and N isthe population abundance. The number of captured individuals and spatial recaptures in the SCRanalysis are known statistics.

25

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 26: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figure 3: Plots of simulation study results for the estimation of abundance. The top row displaysboxplots of the posterior modes for abundance (Simulated value is 166), showing estimator accu-racy, and the bottom row displays the mean 95% CI width, a measure of estimator precision. Theestimators from left to right are the SCR estimator using only certain identity samples, followedby the Genotype SPIM estimators using only the high quality samples with 3, 2, or 1 replicatedassignments, followed by the Genotype SPIM estimators using both high and low quality sampleswith 3, 2, and 1 replicated assignment.

26

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 27: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Tables

Table 1: SCR process and observation model parameter estimates from the SCR and SPIM analysesof the fisher data set. a0 is the overall detection parameter, σ is the population-level detectionfunction spatial scale parameter in km, σ sd is the standard deviation of the individual-level variancein the spatial scale parameter, N is the population abundance, ncap is the number of individualscaptured, and D is the population density (individuals/100km2. Posterior modes are presented aspoint estimates, posterior standard deviations/posterior modes are presented as the coefficient ofvariation, and 95% HPD interval upper and lower bounds (LB and UB) are presented as intervalestimates.

SCR SPIM

Est CV LB UB Est CV LB UB

a0 3.28 20.8 2.19 4.83 4.73 15.4 3.39 6.22σ 1.43 19.2 0.96 2.04 1.32 13.4 1.02 1.71

σ sd 0.84 20.0 0.58 1.20 0.65 16.4 0.50 0.90N 2,321 22.4 1,572 3,529 2,672 16.8 1,974 3,690

ncap 187 . . . 273 1.6 263 280D 4.27 22.4 2.89 6.49 4.92 16.8 3.63 6.79

Table 2: Single locus genotype observation probability parameter estimates for high and low qual-ity samples. “Het” indicates a heterozygous single locus genotype and “hom” indicates a homozy-gous single locus genotype. “Correct”, “AD”, and “FA” indicate a correct, allelic dropout, andfalse allele observation, respectively. Posterior means are presented as point estimates and 95%HPD interval upper and lower bounds are presented as interval estimates.

High Quality Low Quality

Class Type Est LB UB Est LB UB

Het-Correct 0.806 0.795 0.818 0.489 0.460 0.518Het-AD 0.185 0.174 0.197 0.496 0.466 0.525Het-FA 0.009 0.006 0.011 0.015 0.009 0.022

Hom-Correct 0.994 0.991 0.997 0.999 0.998 1.000Hom-FA 0.006 0.003 0.009 0.001 0.000 0.002

27

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 28: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Table 3: Genotype SPIM simulation results for the detection model and abundance. Scenariosindicate the model used (SPIM or SCR), the number of replicated assignments (1-3) and whetherthe low quality samples were included (A) or not (B). The low quality samples were not includedin the SCR analysis by default. λ0 is the baseline detection rate, σ is the detection function spatialscale parameter, N is abundance, and ncap is the number of individuals captured (fewer whenexcluding the low quality samples). The values listed here are the mean point estimates across 120simulated data sets. “Cov” indicates the coverage of the 95% credible intervals, “Wid” indicatesthe mean width of the 95% credible intervals, and “MSE” indicates the mean squared error of thepoint estimates.

Scenario λ0 σ N ncap N Cov n cov N Wid n Wid N MSE

True 0.570 1.320 166.0 115.8 . . . . .SPIM3A 0.562 1.321 165.2 115.6 0.933 0.983 36.9 2.3 105.0SPIM2A 0.566 1.321 164.0 115.0 0.958 0.950 37.6 4.8 119.1SPIM1A 0.574 1.329 159.8 112.8 0.908 0.908 41.5 10.8 171.2

True 0.297 1.320 166.0 88.4 . . . . .SPIM3B 0.296 1.311 163.8 88.3 0.950 0.992 64.0 0.01 279.7SPIM2B 0.296 1.310 163.9 88.3 0.958 0.975 64.3 0.07 277.6SPIM1B 0.297 1.313 162.5 88.2 0.967 0.950 64.0 1.62 281.1

SCR 0.295 1.311 163.9 . 0.983 . 64.1 . 282.7

Table 4: Genotype SPIM simulation results for the genotype observation model. Scenarios indicatethe model used (SPIM), the number of replicated assignments (1-3) and whether the low qualitysamples were included (A) or not (B). “C”, “AD”, and “FA” indicate correct, allelic dropout, andfalse allele probabilities, respectively. “Het” indicates heterozygous genotypes and “hom” indi-cates homozygous genotypes. The high quality sample parameters are indicated with a “1” andlow quality indicated with a “2”.

phet1C phet1

AD phet1FA phet2

C phet2AD phet2

FA phom1C phom1

FA phom2C phom2

FA

True 0.806 0.185 0.009 0.489 0.496 0.015 0.994 0.006 0.999 0.001SPIM3A 0.806 0.185 0.009 0.486 0.498 0.016 0.993 0.007 0.997 0.003SPIM2A 0.805 0.185 0.009 0.484 0.500 0.016 0.993 0.007 0.996 0.004SPIM1A 0.797 0.194 0.010 0.479 0.503 0.018 0.993 0.007 0.989 0.011

SPIM3B 0.806 0.185 0.009 . . . 0.994 0.006 . .SPIM2B 0.805 0.186 0.009 . . . 0.994 0.006 . .SPIM1B 0.792 0.198 0.010 . . . 0.993 0.007 . .

28

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 29: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

References

Arzoumanian, Z., J. Holmberg, and B. Norman. 2005. An astronomical pattern-matching algorithm

for computer-aided identification of whale sharks rhincodon typus. Journal of Applied Ecology,

42:999–1011.

Augustine, B., F. Stewart, J. A. Royle, J. Fisher, and M. Kelly. 2018a. Spatial mark-resight for

categorically marked populations with an application to genetic capture-recapture. BioRxiv,

page 299982.

Augustine, B. C., J. A. Royle, M. J. Kelly, C. B. Satter, R. S. Alonso, E. E. Boydston, and K. R.

Crooks. 2018b. Spatial capture-recapture with partial identity: an application to camera traps.

Annals of Applied Statistics, 11.

Augustine, B. C., J. A. Royle, S. M. Murphy, R. B. Chandler, J. J. Cox, and M. J. Kelly. 2019.

Spatial capture–recapture for categorically marked populations with an application to genetic

capture–recapture. Ecosphere, 10:e02627.

Borchers, D. L. and M. Efford. 2008. Spatially explicit maximum likelihood methods for capture–

recapture studies. Biometrics, 64:377–385.

Chandler, R. B. and J. A. Royle. 2013. Spatially explicit models for inference about density in

unmarked or partially marked populations. The Annals of Applied Statistics, 7:936–954.

Crall, J. P., C. V. Stewart, T. Y. Berger-Wolf, D. I. Rubenstein, and S. R. Sundaresan. 2013.

Hotspotter—patterned species instance recognition. In 2013 IEEE workshop on applications

of computer vision (WACV), pages 230–237. IEEE.

Creel, S., G. Spong, J. L. Sands, J. Rotella, J. Zeigle, L. Joe, K. M. Murphy, and D. Smith. 2003.

Population size estimation in yellowstone wolves with error-prone noninvasive microsatellite

genotypes. Molecular ecology, 12.

Efford, M. and G. Mowat. 2014. Compensatory heterogeneity in spatially explicit capture–

recapture data. Ecology, 95:1341–1348.

29

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 30: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Ellis, A. R. 2018. Accounting for matching uncertainty in photographic identification studies of

wild animals.

Jimenez, J., R. Chandler, J. Tobajas, E. Descalzo, R. Mateo, and P. Ferreras. 2019. Generalized

spatial mark–resight models with incomplete identification: An application to red fox density

estimates. Ecology and evolution, 9:4739–4748.

Johnson, P. C. and D. T. Haydon. 2007. Maximum-likelihood estimation of allelic dropout and

false allele error rates from microsatellite genotypes in the absence of reference data. Genetics,

175:827–842.

Kalinowski, S. T., M. L. Taper, and S. Creel. 2006. Using dna from non-invasive samples to iden-

tify individuals and census populations: an evidential approach tolerant of genotyping errors.

Conservation Genetics, 7:319–329.

Knapp, S. M., B. A. Craig, and L. P. Waits. 2009. Incorporating genotyping error into non-invasive

dna-based mark–recapture population estimates. Journal of Wildlife Management, 73:598–604.

Lamb, C. T., A. T. Ford, M. F. Proctor, J. A. Royle, G. Mowat, and S. Boutin. 2019. Genetic tagging

in the anthropocene: scaling ecology from alleles to ecosystems. Ecological Applications, page

e01876.

Lampa, S., K. Henle, R. Klenke, M. Hoehn, and B. Gruber. 2013. How to overcome genotyping

errors in non-invasive genetic mark-recapture population size estimation—a review of available

methods illustrated by a case study. The Journal of Wildlife Management, 77:1490–1511.

Linden, D. W., A. K. Fuller, J. A. Royle, and M. P. Hare. 2017. Examining the occupancy–density

relationship for a low-density carnivore. Journal of applied ecology, 54:2043–2052.

Link, W. A., J. Yoshizaki, L. L. Bailey, and K. H. Pollock. 2010. Uncovering a latent multinomial:

analysis of mark–recapture data with misidentification. Biometrics, 66:178–185.

Lukacs, P. M. and K. P. Burnham. 2005. Research notes: estimating population size from dna-

based closed capture-recapture data incorporating genotyping error. Journal of Wildlife Man-

agement, 69:396–403.

30

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 31: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Macbeth, G. M., D. Broderick, J. R. Ovenden, and R. C. Buckworth. 2011. Likelihood-based

genetic mark–recapture estimates when genotype samples are incomplete and contain typing

errors. Theoretical population biology, 80:185–196.

McKelvey, K. S. and M. K. Schwartz. 2004. Genetic errors associated with population estima-

tion using non-invasive molecular tagging: problems and new solutions. Journal of Wildlife

Management, 68:439–448.

Miller, C. R., P. Joyce, and L. P. Waits. 2002. Assessing allelic dropout and genotype reliability

using maximum likelihood. Genetics, 160:357–366.

Mills, L. S., J. J. Citta, K. P. Lair, M. K. Schwartz, and D. A. Tallmon. 2000. Estimating ani-

mal abundance using noninvasive dna sampling: promise and pitfalls. Ecological applications,

10:283–294.

Natesh, M., R. W. Taylor, N. K. Truelove, E. A. Hadly, S. R. Palumbi, D. A. Petrov, and U. Ra-

makrishnan. 2019. Empowering conservation practice with efficient and economical genotyping

from poor quality samples. Methods in Ecology and Evolution.

Norouzzadeh, M. S., A. Nguyen, M. Kosmala, A. Swanson, M. S. Palmer, C. Packer, and J. Clune.

2018. Automatically identifying, counting, and describing wild animals in camera-trap images

with deep learning. Proceedings of the National Academy of Sciences, 115:E5716–E5725.

Otis, D. L., K. P. Burnham, G. C. White, and D. R. Anderson. 1978. Statistical inference from

capture data on closed animal populations. Wildlife monographs, pages 3–135.

Paetkau, D. 2003. An empirical exploration of data quality in dna-based population inventories.

Molecular ecology, 12:1375–1387.

Paetkau, D., L. P. Waits, P. L. Clarkson, L. Craighead, E. Vyse, R. Ward, and C. Strobeck. 1998.

Variation in genetic diversity across the range of north american brown bears. Conservation

Biology, 12:418–429.

Royle, J. A., R. B. Chandler, R. Sollmann, and B. Gardner. 2013. Spatial capture-recapture. Aca-

demic Press.

31

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 32: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Royle, J. A. and R. M. Dorazio. 2008. Hierarchical modeling and inference in ecology: the analysis

of data from populations, metapopulations and communities. Elsevier.

Royle, J. A., A. K. Fuller, and C. Sutherland. 2016. Spatial capture–recapture models allowing

markovian transience or dispersal. Population ecology, 58:53–62.

Schwartz, M. K., S. A. Cushman, K. S. McKelvey, J. Hayden, and C. Engkjer. 2006. Detecting

genotyping errors and describing american black bear movement in northern idaho. Ursus,

17:138–149.

Sethi, S. A., G. M. Cook, P. Lemons, and J. Wenburg. 2014. Guidelines for msat and snp panels

that lead to high-quality data for genetic mark–recapture studies. Canadian Journal of Zoology,

92:515–526.

Sethi, S. A., D. Linden, J. Wenburg, C. Lewis, P. Lemons, A. Fuller, and M. P. Hare. 2016. Accurate

recapture identification for genetic mark–recapture studies with error-tolerant likelihood-based

match calling and sample clustering. Royal Society open science, 3:160457.

Taberlet, P., S. Griffin, B. Goossens, S. Questiau, V. Manceau, N. Escaravage, L. P. Waits, and

J. Bouvet. 1996. Reliable genotyping of samples with very low dna quantities using pcr. Nucleic

acids research, 24:3189–3194.

Waits, L. P., G. Luikart, and P. Taberlet. 2001. Estimating the probability of identity among geno-

types in natural populations: cautions and guidelines. Molecular ecology, 10:249–256.

Waits, L. P. and D. Paetkau. 2005. Noninvasive genetic sampling tools for wildlife biologists:

a review of applications and recommendations for accurate data collection. The Journal of

Wildlife Management, 69:1419–1433.

Wolf, C. and W. J. Ripple. 2017. Range contractions of the world’s large carnivores. Royal Society

Open Science, 4:170052.

Wright, J. A., R. J. Barker, M. R. Schofield, A. C. Frantz, A. E. Byrom, and D. M. Gleeson. 2009.

Incorporating genotype uncertainty into mark–recapture-type models for estimating abundance

using dna samples. Biometrics, 65:833–840.

32

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 33: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Estimating Animal Density Using Genetic MarkersObserved with Error

Ben C. Augustine, J. Andrew Royle, Daniel W. Linden, and Angela K. Fuller

December 13, 2019

Appendix A: Technical Details and MCMC AlgorithmHere, we describe the technical details of the model structure omitted in the main text and thenovel features of the MCMC algorithm. We refer readers to Table A1 below, where we reproducea table with definitions for model parameters and data structures (observed and latent) for thecategorical Spatial Partial Identity Model (catSPIM) from Augustine et al. (2019), followed by thenew parameters and data structures required to allow for observation error in the category levels.We retain the same notation from Augustine et al. (2019) for consistency.

General Model for Observation ErrorIn order to relax the requirement of the catSPIM that category levels must be recorded correctly, weintroduce the possibility that the individual identity category levels of each sample are observedmultiple times, subject to error. For brevity, we will refer to these replicated category level ob-servations or assignments as “replicated assignments”, which could be replicated category levelobservations from multiple observers looking at features in photographs (See Appendix E on An-dean Bears) or replicated DNA scores across multiple amplifications and PCRs to determine thevalues of microsatellites or SNPs across loci. Let Gobs.true be an nobs× ncat matrix, where eachrow, l, contains the full categorical identity of individual i, to which sample l belongs, with thetrap of capture recorded in the lth row of Y obs. More specifically, gobs.true

l takes the same valuesas gtrue

i when sample l comes from individual i. Gobs.true corresponds to Gobs in the catSPIMmodel, except there are no missing covariate values because Gobs.true is a partially latent, ratherthan fully observed, data structure. In this model with observation error, we define Gobs, contain-ing the replicated assignments, to be an array of size nobs× ncat × nrep, where nrep is the numberof replicated assignments, indexed by m. Then, [gobs

mlo|gobs.trueml = c] = Categorical(πl.c), where πl

is an nlevelsl ×nlevels

l matrix of category level observation probabilities, and c is the matrix columnmatching the value of gobs.true

ml . These observation probabilities may or may not differ across iden-tity covariates, l. More specifically, πlrc = P(observer classifies as category level r| true categorylevel c) for covariate l. As with the catSPIM, missing category level values are allowed in Gobs,coded as a “0”. We assume these are missing at random with respect to individual and identitycovariate values (i.e., missingness does not contain information about individual identity or thevalue of the identity covariate).

1

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 34: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

This general model structure for category level observation error can accommodate a widevariety of observation systems and error processes; however, it will need to be modified if thereare more than just a few category levels in order to estimate error probabilities for an increasingnumber of possible observation events with typically sparse capture-recapture data sets. Multipleobservation events may be combined into a reduced number of event types if a priori knowledgeabout the error processes is available. We will take this approach in the next section to mod-ify the observation process above for genetic capture-recapture where we consider three possibleoutcomes–a genotype locus can be classified correctly, there can be allelic dropout, or there canbe a false allele. A similar approach can be taken for other types of error processes for example,reading tags or bands, (Cowen and Schwarz 2006; Bonner et al. 2016). We will refer to the mostgeneral model structure presented above as “catSPIM-OE” to distinguish it from the genotypingmodel to follow, though the catSPIM-OE error structure may be appropriate for SNPs which canonly take 2 values per locus, where there are only 2 possible error types. See Appendix E for asimulation study of catSPIM-OE and an application of catSPIM-OE to camera trapping data ofAndean bears.

Model for Genotyping ErrorBefore proceeding, a cursory description of microsatellites and genotyping errors is required. Mi-crosatellite markers consist of genetic sequence repeats where the number of repeats determinesthe value of an allele, of which there are two per locus. There is no ordering of the alleles, and theallele with the fewest number of repeats is customarily listed first. For example, a microsatellitelocus for one individual may have a value of 150.152, indicating one allele has 150 repeats and theother has 152. In this case, we say that 150.152 is a single locus genotype, while the combinedvalues across multiple loci constitute a multilocus genotype. A single locus genotype is said tobe heterozygous if the two alleles do not share the same value, say 150.152, and homozygous ifthey do, say 150.150. This is an important distinction for how genotyping errors arise. The mostcommon genotyping error is allelic dropout (Roon et al. 2005) where a heterozygous single locusgenotype is erroneously scored as a homozygote, taking the value of only one of the two allelespresent. For example, the heterozygous single locus genotype 150.152 may be scored as either150.150 or 152.152 as the result of an allelic dropout event. While typically more rare, other er-rors may occur, including false alleles, laboratory errors, and transcription errors (McKelvey andSchwartz 2004). For simplicity, we will classify genotyping errors into two categories based ontheir outcome, rather than their cause–we will use “allelic dropout” to describe any event causinga true heterozygote to be scored as a homozygote and “false allele” to describe any event leadingto an error other than allelic dropout.

Now, we will describe how microsatellite loci can be used as categorical identity covariates.Similar to Wright et al. (2009), we enumerate the loci-level genotypes from 1 to nlevels

l at each loci,l, in an arbitrary order. There are several possible ways to determine which genotypes to enumerateat each loci. One option is to include all genotypes that were observed in the current sample, eitherin all the replicated assignments or in the consensus genotypes. This option will identify all themost common genotypes, but may miss rare genotypes. A more comprehensive option, which willbe assumed for the remainder of the model description, is to create a set of genotypes from allpossible combinations of the observed set of alleles. This option may still miss some genotypesthat occur in the population, but the total number of missed genotypes will necessarily be lower

2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 35: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

than enumerating only the observed genotypes unless all genotypes were observed. Enumeratingall possible genotypes implied by the observed alleles may enumerate genotypes that do not existin the population; however, their frequencies will be estimated near zero if not observed. Eitherof these two approaches may enumerate genotypes that do not exist in the population if falsealleles are erroneously included in the list of possible alleles, but the frequencies of genotypescontaining erroneous alleles should also be estimated near zero if they are rarely observed and/orif genotyping error is modeled as we will address below. Finally, γ now carries the interpretationof the loci-level genotype frequencies. This is distinguished from Wright et al. (2009), where γ isthe allele frequencies, which are converted to genotype frequencies by assuming Hardy-Weinbergequilibrium. We do not make this assumption at the cost of estimating more parameters.

With the loci-level genotypes enumerated, the catSPIM-OE model described above can bedirectly applied to genetic capture-recapture studies using microsatellite markers. One caveat;however, is that we will simplify the elements of πl , now with the interpretation of the genotypeobservation probabilities conditional on the true genotype at locus l, using the specific genotypingerror mechanisms of allelic dropout and false allele events described above. We will refer to thismodel as the “Genotype SPIM” to distinguish it from the more general model above. We use asimple model for these events that assumes that each possible allelic dropout event is equally likely(previously assumed by Wright et al. 2009; Sethi et al. 2016), each possible false allele event isequally likely (previously assumed by Sethi et al. 2016), and the allelic dropout and false alleleprobabilities do not vary across sample (relaxed below), locus, individual, or replicate number. Wedefine phom and phet to be vectors of the observation probabilities for homozygous and heterozy-gous loci-level genotypes, respectively. Then, phom = (phom

C , phomFA ) for homozygous correct and

false allele observation and phet = (phetC , phet

AD, phetFA) for heterozygous correct, allelic dropout, and

false allele observation. Splitting the observation probabilities by zygosity is required due to thediffering number of possible observation events for homozygotes and heterozygotes, but it alsoallows the false allele probabilities to vary by zygosity (see Johnson and Haydon 2007).

The probability of classifying true genotype c as observed genotype r depends on the values ofboth c and r as follows. For correct observation events, gobs

mlo = gobs.trueml , the probability of correctly

classifying sample m at locus l on replicate o, [gobsmlo|g

obs.trueml ] is phom

C and phetC for homozygous

and heterozygous genotypes, respectively. For allelic drop out events, i.e., gobs.trueml is heterozy-

gous and gobsmlo is homozygous matching one of the two alleles in gobs.true

ml , [gobsmlo|g

obs.trueml ] = phet

AD/2.We divide the allelic dropout probability by two because there are two ways to observe allelicdropout. Finally, for false allele events, i.e., gobs.true

ml 6= gobsmlo and gobs

mlo is not homozygous taking avalue matching one of the alleles in gobs.true

ml , [gobsmlo|g

obs.trueml ] = phom

FA /(nlevelsl − 1) for homozygous

genotypes since there are no possible allelic dropout events and one correct observation event and[gobs

mlo|gobs.trueml ] = phet

FA/(nlevelsl − 3) for heterozygous genotypes since there are 2 possible allelic

dropout events and one possible correct observation event. These probabilities are plugged intothe appropriate elements of πl for each loci l. The genotype observation probabilities will needto be modified from these listed if not all possible loci-level genotypes implied by the observedloci-level alleles are enumerated and included in the model.

To illustrate how the genotyping errors are modeled, consider a genetic capture-recapture studywith three possible alleles at the first locus–150, 152, and 154. The set of all possible genotypesat this locus is (150.150, 150.152, 150.154, 152.152, 152.154, 154.154). Then, the genotypeobservation probabilities are:

3

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 36: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

π1 =

150.150 150.152 150.154 152.152 152.154 154.154

phomC phet

AD/2 phetAD/2 phom

FA /5 phetFA/3 phom

FA /5 150.150phom

FA /5 phetC phet

FA/3 phomFA /5 phet

FA/3 phomFA /5 150.152

phomFA /5 phet

FA/3 phetC phom

FA /5 phetFA/3 phom

FA /5 150.154phom

FA /5 phetAD/2 phet

FA/3 phomC phet

AD/2 phomFA /5 152.152

phomFA /5 phet

FA/3 phetFA/3 phom

FA /5 phetC phom

FA /5 152.154phom

FA /5 phetFA/3 phet

AD/2 phomFA /5 phet

AD/2 phomC 154.154

where the column labels indicate the true genotype and the row labels indicate the observedgenotypes. Note, columns must sum to 1 and the elements of π will vary across loci, dependingon the number and values of possible genotypes at each locus.

Alternate Observation ModelsPrevious studies (Augustine et al. 2018a,b, 2019) have suggested individual heterogeneity in de-tection function parameters, especially σ , can erode the performance of SPIMs for real world datasets. Here, we consider a model for individual heterogeneity in detection that assumes there is aninverse relationship between λ0 and σ , specifically, λ0 = a0/(2σ2) (Efford and Mowat 2014). Thismodel is motivated by the idea that an individual’s detection rate in space should be proportional toits utilization distribution or rate of space use. Efford and Mowat (2014) interpret a0 as the “singledetector effective sampling area” by showing that under idealized conditions, a0 is approximatelyequal to the total individual-level effective sampling area. However, we will interpret a0 more gen-erally as the overall detection parameter that scales a bivariate normal space use model followingλ (x) = a0 f (x) where λ (.) is the detection function for the expected counts described above, f (x)is a bivariate normal PDF, and x is a spatial location. We modeled an individual random effect onσ where log(σi) = log(σ µ)+σ

o f f seti and σ

o f f seti ∼ Normal(0,σ sd). The induced random effect

on baseline detection is then λ0i = a0/(2σ2i ). The estimated parameters of this model are a0, σ µ ,

σ sd , and the σ offsets vector σo f f set .For the Genotype SPIM described above, we make the assumption that the the genotype obser-

vation probabilities do not vary across sample, locus, individual, or replication number. Here, wewill relax the assumption that the genotype observation probabilities do not vary across samplesbecause samples can vary considerably in the amount and quality of DNA they contain and thusthe probabilities that genotyping errors will occur (Paetkau 2003). While this variability could bemodeled as a function of continuous and/or categorical sample-level covariates using a mutlino-mial logistic link, for conceptual and computational simplicity, we just consider that samples canbe split into high and low quality categories with category definitions specific to the application.We estimate separate genotype observation probabilities for each category, phom−high, phom−low,phet−high, phet−low, which are organized in genotype observation matrices, πhigh and πlow. Fur-ther covariate effects for samples, locus, or replication number could be accommodated using themultinomial logistic link and fit using an auxiliary variable model (Holmes et al. 2006).

4

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 37: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Data Augmentation and InferenceWe use a process similar to data augmentation (Royle et al. 2007) to probabilistically resolveY true and Gtrue and estimate N. In typical applications of data augmentation in capture-recaptureMCMC algorithms, the capture histories of the n captured individuals are observed, which arethen augmented with all zero capture histories up to M possible individuals where M >> N. InSPIMs, n is unknown, but we can specify an M× J matrix, Y true, and initialize it with a possi-bly true capture history using the spatial proximity of observed samples. Similarly, we specify anM×J matrixGtrue and initialize it with possibly true full categorical identities consistent with thematched samples in Y true and their associated observed category levels. Then, we proceed withdata augmentation as normal, specifying a vector z, of length M, to indicate whether individuali is in the population (zi = 1) or not (zi = 0), assuming zi ∼ Bernoulli(ψ). This individual-levelBernoulli assumption induces the relationship N ∼Binomial(M,ψ), where ψ is a nuisance param-eter (Royle et al. 2007). Population abundance is a derived parameter, N = ∑

Mi=1 zi, and population

density, D, is NA . While the number of individuals captured is not known, it can be estimated as

a derived parameter following ncap = ∑Mi=1((∑

Jj=1Y true

i j )> 0) (the number of individuals assignedat least one sample; Augustine et al. 2019). The number of spatial recaptures can be calculatedin a similar manner. A final derived parameter of interest is the posterior probabilities of pairwisesample matches, all sets of P(sample A and sample B came from the same individual), which arejust the proportion of the posterior samples for which samples A and B are assigned to the sameindividual.

1 MCMC DetailsThe joint posterior of the categorical SPIM with observation error is very similar to that of thecategorical SPIM. The joint posterior is:

[λ0,σ ,Y true,Gtrue,Gtrue.obs,γ,π,S,z,ψ|Y obs,Gobs,X ]

{M

∏i=1

{J

∏j=1

{nobs

∏m=1

[yobsm j ,G

obs.truem. |ytrue

i j ,Gtruei. ]

}}}

×

{M

∏i=1

{J

∏j=1

[ytruei j |λ0,σ ,si,zi,x j]

}}

×

{ncat

∏l=1

{nobs

∏m=1

{nrep

∏o=1

[gobsmlo|g

obs.trueml ,πl]

}}πl

}

×

{M

∏i=1

{ncat

∏l=1

[gtrueil |γl]

}}

×

{M

∏i=1

[zi|ψ][si]

}× [λ0][σ ][ψ][γ]

5

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 38: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

The prior distributions are

1. p(λ0)∼ Uniform(0,∞)

2. p(σ)∼ Uniform(0,∞)

3. p(ψ)∼ Uniform(0,1)

4. p(si)∼ Uniform(S )

5. p(γl)∼Dirichlet(αl), where αl is the vector of Dirichlet parameters of length ncatl , indexed

by g below. All αl were set to vectors of 1.

6. p(πl.c) ∼ Dirichlet(βlc), where βlc is the vector of Dirichlet parameters of length ncatl , in-

dexed by r below. All βlc were set to vectors of 1.

Here, we list the full conditional distributions or the distributions that the full conditionals areproportional to, used to update the parameters and latent variables, that are unique to the categoricalSPIM with observation error.

1. [gtrueil |γl]=Categorical(γl)

2. [gobsmlo|g

obs.trueml ,πl.c ] ∝ [gobs.true

ml |gobsmlo,πl.c][πl.c] = Categorical(πl.c) where gobs.true

ml = c.

3. [πl.c |gobsmlo,g

obs.trueml ] ∝ [gobs

mlo|gobs.trueml ,πl.c ] = Categorical(πl.c) where gobs.true

ml = c.

Here we outline the new steps of the MCMC algorithm required to introduce observation errorin the category levels. See Augustine et al. (2019) for the remaining parameter updates.

1. Update Y true. The categorical SPIM uses a proposal distribution based on the SCR obser-vation model to update Y true in a Metropolis-Hastings update where the only likelihood tobe evaluated is the SCR observation model.

p f ori = q f or

mλi j

∑i λi j. (1)

where q f ori is the probability the randomly selected focal sample belongs to individual i at

trap j. In fact, when the SCR observation model is Poisson, this proposal distribution is thefull conditional distribution. With the introduction of category level observation error, theproposal distribution also needs to consider the proposal probabilities of different categorylevel observation types and we need to evaluate the category level observation likelihoodin the MH ratio. In order to update Y true, on each iteration, we randomly select a user-specified number of indices m of Y obs to propose new individual identities for (becauseupdating all individual identities on each iteration may not be most efficient). The currentindividual identity of Y obs

m. is stored in IDm, indicating which individual i to which sample mis assigned. So for each selected index m, we select a new individual i that we will proposefor sample m. Once we select IDcand

m , we construct Y true−candi. for i ∈ (IDcurr

m , IDcandm ) by

moving sample m from individual IDcurrm to individual IDcand

m and setGobs.true−candm. equal to

Gtruei. for i = IDcand

m . The forward proposal distribution for selecting IDcandm is:

6

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 39: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

p f ori = q f or λi j ∏l ∏o[gobs

mlo|gtrue−candil ]

∑i λi j ∏l ∏o[gobsmlo|g

true−candil ]

. (2)

where j is the trap that focal sample m was recorded at and ∏l ∏o[gobsmlo|g

trueil ] is the category

level likelihood of the focal sample if it belonged to individual i. The backwards proposalprobability is calculated as above, with the current and proposed arguments exchanged. Wethen accept the proposal with probability:

min

(1,

∏i∈(IDcurrm ,IDcand

m )∏ j f (ytrue−candi j )∏m3IDcand

m =i ∏l h(gobs.true−candml )

∏i∈(IDcurrm ,IDcand

m )∏ j f (ytrue−curri j )∏m3IDcurr

m =i ∏l h(gobs.true−currml )

pback

p f or

). (3)

where f (.) is the SCR observation model likelihood and h(.) is the category level observa-tion likelihood, and p f or and pback are obtained from the proposal distributions evaluated atIDcand

m and IDcurrm , respectively. As in the categorical SPIM, this modified proposal distribu-

tion is the full conditional when the SCR observation model is Poisson.

2. UpdateGtrue andGobs.true jointly. In the categorical SPIM, latent values of gtrueil are updated

using the full conditional Categorical(γl) if gtrueil is latent and cannot be updated otherwise.

gtrueil entries are not latent if at least one sample m is currently assigned to individual i whose

lth index of gobsml is not missing (Gobs is equivalent to Gobs.true in the current model). This can

be described as requiring at least one m such that IDm = i. In this model with observationerror, we update gtrue

il in the same manner when no samples m are assigned to individual iwhose lth index of gobs.true

ml is not missing. However, once we introduce observation error,updating gtrue

il is possible when observed samples are assigned to it. More specifically, gtrueil

values are deterministically linked to gobs.trueml values for samples m currently assigned to

individual i (for m 3 IDm = i). Because gobsmlo are observed with error, we can update both

gtrueil and its associated gobs.true

ml values, changing the category level observation likelihood inthe process. We will use a Metropolis-Hastings update for these linked indices of gtrue

il andgobs.true

ml that are not currently unobserved.

First, we propose gtrue−candil from Categorical(γl), which also updates gobs.truecand

ml for m in-dicies currently assigned to individual i. The forward proposal probability, p f or, is thenCategorical(γl) evaluated at gtrue−cand

il and the backwards proposal probability, pback is Categorical(γl)evaluated at gtrue−curr

il . We accept the proposal with probability:

min

(1,

f (gtrue−candil )∏m3IDm=i h(gobs.true−cand

ml )

f (gtrue−curril )∏m3IDm=i h(gobs.true−curr

ml )

pback

p f or

). (4)

where f (.) is the category level likelihood and h(.) is the category level observation likeli-hood (distributions 1 and 2 listed above).

3. Update πl.c . This update is for the general observation error model–see the next step for theGenotype SPIM update. Consider the following π structure for ncat = 2 and nlevels = {2,3}.

π1 =

[π111 π112

π121 π122

]π2 =

π211 π212 π213

π221 π222 π223

π231 π232 π233

7

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 40: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Columnπ1.1 corresponds to the category level observation probabilities for identity covariate1, which has 2 levels, conditional on the true value being 1, and column π2.3 correspondsto the category level observation probabilities for identity covariate 2, which has 3 levels,conditional on the true value being 3. The diagonals of these matrices are the probabilitiesof correct observation.

We use a standard Dirichlet-multinomial update (e.g., Wright et al. 2009). πl.c is updatedwith a Gibbs step. By adopting a Dirichlet prior for πl.c , we get a Dirichlet full conditionalwith parameter vector β′lc = ylc +βlc where ylcr is the number of Gobs

.l. = r values for whichGobs.true.l = c and βlc is the prior parameter vector. To draw values from the full conditional,

we simulate a vector of Gamma random variables glc ∼ Gamma(β′lc ,1), where glc is oflength ncat

l . Then, after renormalizing these gamma random variables by glcr∑r glcr

, we have theDirichlet full conditional.

4. Update phom and phet . For the Genotype SPIM, the elements of each πl are constrained bythe correct classification, allellic dropout, and false allele probabilities. As in the more gen-eral model, we use a standard Dirichlet-multinomial to update phom and phet independently,adopting independent Dirichlet priors for each. For phom, the Dirichlet full conditional pa-rameter vector βhom′

c = yhomc +βhom

c where yhomc is the number of homozygous genotype

correct classification events for c = 1 and the number of homozygous genotype false alleleevents for c = 2. For phet , the Dirichlet full conditional parameter vector βhet ′

c = yhetc +βhet

cwhere yhet

c is the number of heterozygous genotype correct classification events for c = 1,the number of heterozygous genotype allelic dropout events for c = 2, and the number ofheterozygous genotype false allele events for c = 3. βhom

c and βhetc are the prior parameter

vectors. After updating phom and phet , their values are plugged into the appropriate elementsof π. For the model with genotype observation probabilities that vary by sample type, weupdate the high and low sample quality parameters exactly as above using only the high andlow quality samples, respectively.

8

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 41: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

ReferencesAugustine, B., F. Stewart, J. A. Royle, J. Fisher, and M. Kelly. 2018a. Spatial mark-resight for

categorically marked populations with an application to genetic capture-recapture. BioRxiv,page 299982.

Augustine, B. C., J. A. Royle, M. J. Kelly, C. B. Satter, R. S. Alonso, E. E. Boydston, and K. R.Crooks. 2018b. Spatial capture-recapture with partial identity: an application to camera traps.Annals of Applied Statistics, 11.

Augustine, B. C., J. A. Royle, S. M. Murphy, R. B. Chandler, J. J. Cox, and M. J. Kelly. 2019.Spatial capture–recapture for categorically marked populations with an application to geneticcapture–recapture. Ecosphere, 10:e02627.

Bonner, S. J., M. R. Schofield, P. Noren, S. J. Price, et al. 2016. Extending the latent multino-mial model with complex error processes and dynamic markov bases. The Annals of AppliedStatistics, 10:246–263.

Cowen, L. and C. J. Schwarz. 2006. The jolly–seber model with tag loss. Biometrics, 62:699–705.

Efford, M. and G. Mowat. 2014. Compensatory heterogeneity in spatially explicit capture–recapture data. Ecology, 95:1341–1348.

Holmes, C. C., L. Held, et al. 2006. Bayesian auxiliary variable models for binary and multinomialregression. Bayesian analysis, 1:145–168.

Johnson, P. C. and D. T. Haydon. 2007. Maximum-likelihood estimation of allelic dropout andfalse allele error rates from microsatellite genotypes in the absence of reference data. Genetics,175:827–842.

McKelvey, K. S. and M. K. Schwartz. 2004. Genetic errors associated with population estima-tion using non-invasive molecular tagging: problems and new solutions. Journal of WildlifeManagement, 68:439–448.

Paetkau, D. 2003. An empirical exploration of data quality in dna-based population inventories.Molecular ecology, 12:1375–1387.

Roon, D. A., L. P. Waits, and K. C. Kendall. 2005. A simulation test of the effectiveness of severalmethods for error-checking non-invasive genetic data. In Animal Conservation forum, volume 8,pages 203–215. Cambridge University Press.

Royle, J. A., R. M. Dorazio, and W. A. Link. 2007. Analysis of multinomial models with unknownindex using data augmentation. Journal of Computational and Graphical Statistics, 16:67–85.

Sethi, S. A., D. Linden, J. Wenburg, C. Lewis, P. Lemons, A. Fuller, and M. P. Hare. 2016. Accuraterecapture identification for genetic mark–recapture studies with error-tolerant likelihood-basedmatch calling and sample clustering. Royal Society open science, 3:160457.

Wright, J. A., R. J. Barker, M. R. Schofield, A. C. Frantz, A. E. Byrom, and D. M. Gleeson. 2009.Incorporating genotype uncertainty into mark–recapture-type models for estimating abundanceusing dna samples. Biometrics, 65:833–840.

9

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 42: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Term Definition

N Population abundanceJ The number of trapsK The number of capture occasionsM Level of data augmentationX J×2 matrix of trap locations

nobs The number of observed samplesncap The number of captured individuals (latent)ncat The number of categorical identity covariates

nlevelsl The number of category levels for categorical identity covariate lγl A vector of category level probabilities for categorical identity covariate

l, corresponding to values 1,. . .,nlevelsl

ψ The data augmentation individual inclusion probabilityλ0 The baseline detection rate (count data)p0 The baseline detection probability (detection data)σ The detection function spatial scale parameterλi j The individual by trap detection ratepi j The individual by trap detection probabilityz The data augmentation indicator vector of length MS M×2 matrix of individual activity centersGtrue An M×ncat matrix of full categorical identitiesY true The latent M× J capture historyY obs The observed nobs×J capture history, with one capture per row indicat-

ing the trap of capture

nreps The number of independent, replicate assignments of the categoricalidentity covariates

Gobs.true An nobs×ncat matrix of true, latent sample-level categorical identitiesGobs An nobs × ncat × nobs array of sample-level observed (possibly erro-

neously) categorical identitiesπl An nlevels

l × nlevelsl matrix of observation probabilities for covariate l

conditional on the true category level. The diagonal elements repre-sent correct observation probabilities and off-diagonal elements repre-sent erroneous observation probabilities

ID A derived vector of length nobs recording the individual index i to whichthe mth row of Y obs andGobs.true are linked

phom A vector of genotype observation probabilities that determine the el-ement values of π for homozygous loci-level genotypes. phom =(phom

C , phomFA ), the probabilities of correct and false allele observation,

respectively.phet A vector of genotype observation probabilities that determine the el-

ement values of π for heterozygous loci-level genotypes. phet =(phet

C , phetAD, phet

FA), the probabilities of correct, allelic dropout, and falseallele observation, respectively.

10

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 43: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

sigiltrueγl

yijtrue

zi N

θ

πl gmloobs ymj

obs

ψM

i = 1,…,MIndividuals

l = 1,…,ncat

ID covariates

or Loci

m = 1,…,nobsSamples

gmlobs.true

j = 1,…,JTraps

o = 1,…,nreps

Replicated

Assignments

Figure A1: Directed acyclic graph of the catSPIM-OE model. Dashed lines represent objectslinked along the first dimension and dashed arrows represent disaggregation of data from the in-dividual level to the sample level. θ represents the detection parameters. All other terms aredescribed in the table above.

11

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 44: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Estimating Animal Density Using Genetic MarkersObserved with Error

Ben C. Augustine, J. Andrew Royle, Daniel W. Linden, and Angela K. Fuller

December 13, 2019

Appendix B: MCMC Details for Fisher Data Set

MCMC SpecificationsFor the SCR analysis, we ran 5 MCMC chains for 200,000 iterations, thinning by 10, and discard-ing the first 25,000 of each chain as burn in, leaving 87,500 samples for posterior inference. Dueto convergence problems with the Genotype SPIM for this data set, we initialized the samples forone particular individual with a long-distance spatial recapture to belong to the same individual(discussed further below). After this deviation from the typical algorithm for initializing all latentstates, we ran 40,000 iterations as burn in across 5 chains, followed by 240,000 iterations for eachchain, thinned by 10, leaving 120,000 iterations for posterior inference. For both analyses, we useda continuous, polygonal state space with a minimum distance between polygon vertices and trapsof at least 3σ . We set the data augmentation level, M, to 5500 in the SCR analysis and 5000 in theSPIM analysis (lower due to greater precision for N).

Due to the low number of spatial recaptures, we put an informative prior of Gamma(3,7) onσ sd , the standard deviation of the individual-specific detection function spatial scale parameter,which ruled out most unreasonably large point estimates of σi. We used the uninformative priorslisted in Appendix A for the remaining parameters in both models, including a Uniform(0,∞) priorfor σ .

Computation TimeBoth the SCR and SPIM analyses required long computation times, mostly due to the large dataaugmentation level, a large number of traps, and the slow mixing of the σ sd parameter, which re-quired more posterior samples than would be required for a model without individual heterogeneityin detection function parameters. The MCMC samplers for both models completed roughly 1000iterations per hour on a machine with a 2.2GHz processor. The simpler SCR MCMC sampler wasabout as fast as the more complicated Genotype SPIM sampler because the lower precision forabundance required a larger level of data augmentation (M=5500 vs 5000). The long run timesper chain were offset substantially by the use of multiple chains on multiple cores. Recordingall posteriors for the Genotype SPIM model (especially Gtrue) could potentially use a prohibitiveamount of RAM, but this was ameliorated by thinning the posterior and running the sampler for a

1

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 45: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

relatively short number of iterations (40,000), saving the posterior, and then restarting the MCMCchain from the previous final state.

Convergence IssueThe model using the fisher data set and our typical algorithm for initializing the parameters andlatent variables did not converge. What we mean when we say that the model did not converge isthat all of the samples for one particular individual that were assigned the same individual identityin the original study were not linked by the Genotype SPIM and the reason for the failed linkagewas not due to the observed genotypes of the samples–all 5 samples in question had matchingconsensus genotypes. We argue that the failure of the Genotype SPIM to link these samples to thesame individual is a violation of the SCR observation model assumptions.

The algorithm we use to initialize the latent variables, specifically, how we initialize samplesto individuals to create the starting latent capture history can be seen as starting from an “overdis-persed” state in the sense that we start from a very unlikely assignment of samples to individualswith respect to the expected true σ . We initialize the latent capture history by only linking sampleswith similar observed genotype scores at the same trap, implying that σ is very small because thereare no spatial recaptures in this state. Then for the model to converge, the spatial recaptures aregradually linked up, or at least linked with some probability, and the σ value converges upwards.This did not occur for the focal individual in question within the 800,000 iterations that we ranstarting from this overdispersed state.

It is possible the Genotype SPIM was correct in not linking up these 5 samples and that 2individuals in the population had the same genotype–a shadow event. We do not believe this is thecase. First, while this is the longest distance spatial recapture if all 5 samples belong to the sameindividual, it is of a similar distance as other observed spatial recaptures (Figure B1). Therefore, ifthese samples do belong to two individuals, they must live very close to one another, which is lesslikely under our model assumptions than two individuals with the same genotype living anywherein the state space.

Figure B2 provides some clues about why these samples are not being linked to the sameindividual. Depicted there are the activity center posteriors from a 40,000 iteration subset of theMCMC chain that we claim has not converged. It links the 4 samples in trap 165 to one proposedindividual and the 1 sample in trap 179 to another proposed individual. Notice that the activitycenter posterior for the proposed individual linked to trap 165 is much more spatially constrainedthan the activity center posterior for the proposed individual linked to trap 179. Further, noticethat these posteriors do not overlap in space. In order for these 5 samples to be combined into 1individual, the activity centers of proposed individuals 1 and 2 must overlap in space. For example,in order to have a non-negligible probability of moving the 1 sample at trap 179 to the sameindividual with 4 samples at trap 165, the activity center for individual 1 in the current configurationmust be a similar distance from trap 179 as the activity center for individual 2 in the currentconfiguration. This is required because the expected number of detections is a function of thedistance between the activity center and the trap.

The differential spatial extents covered by the activity center posterior distributions for thesetwo proposed individuals is a consequence of the individual heterogeneity model for the detectionfunction parameters where σi is inversely related to λ0i for individual i. Because proposed indi-vidual 1 was captured 4 times in trap 165, it must have a larger λ0i than proposed individual 2

2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 46: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

captured 1 time in trap 179. This necessarily implies proposed individual 1 must have a smallerσi. As a consequence of the smaller σi, the activity center for proposed individual 1 is estimatedmuch more precisely. The estimated detection functions for these two proposed individuals aredisplayed in Figure B3. There, we see that the activity center for proposed individual 2 must movewithin a distance of roughly 2 km to trap 179 before it has any chance that the sample at the trapwill be combined with the other 4 samples. Where the two detection functions cross is the distanceat which the 1 sample at trap 179 would be equally likely to belong to proposed individual 1 or 2if both their activity centers were exactly this distance away from trap 179. This never occurs inFigure B2.

Now, consider what σi and λ0i would be when these 5 samples are allocated to 1 individual ifall model assumptions are met. The activity center would necessarily be somewhere between traps165 and 179, though closer to trap 165 because there are more captures there. In order for thesesamples to be linked, σi must be relatively large. But the expected λ0i corresponding to a large σiis relatively small. The 4 detections at trap 165 are then very improbable. This apparently unlikelyset of events could be due to this individual’s detection function deviating from the assumptionthat detection is exactly proportional to space use, or it could be because the Poisson model forthe expected number of detections as a function of the distance between the activity center andthe trap is not exactly correct. There could be overdispersion in the expected counts, for example,because factors other than the distance between the activity center and trap determine the observedcount. In this survey, there were 3 week long capture occasions and the 4 samples at trap 165 wereobserved as a count of 2 on occasions 2 and 3. If the difference between a count of 1 and 2 is atleast partially influenced by the behavior of an individual encountering a hair snare, rather than thedistance between the activity center and the trap, overdispersion in the Poisson observation modelshould be expected. Perhaps a better SCR observation model is one where detection is a functionof the distance between an activity center and a trap and the number of detections given at least1 detection is not a function of the distance between an activity center and a trap (e.g., a hurdlemodel).

One final note is that if the violations of the SCR observation model do explain the lack ofconvergence for the Genotype SPIM for the fisher data set, they may not prevent convergence ifthe two traps in question were closer together in space. It may be the combination of a long dis-tance spatial recapture with the violation of the SCR observation model that led to this convergenceproblem. Regardless, when we initialized all 5 of these samples as belonging to the same individ-ual, they remained in this state for 240,000 iterations across each of 5 MCMC chains, so it appearsthe problem leading to the lack of convergence did not prevent us from sampling from the trueposterior when we initialized the latent variables closer to the presumably true values. Once thesesamples are connected to one individual, it is extremely unlikely that a second individual with theexact same genotype will be proposed, much less one that is close enough to these samples to splitthem across two individuals. Further, no convergence problems were identified in the simulationstudy where we know all model assumptions were met.

3

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 47: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figures

Figure B1: The mean capture locations and spatial recaptures for the 8 individuals with spatialrecaptures in the original fisher data set, with the focal individual responsible for the lack of con-vergence distinguished from the other individuals in yellow.

4

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 48: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figure B2: The activity center posteriors for the samples in question at traps 165 and 179 whenthey are split between two individuals. There were 4 samples observed at trap 165 and 1 at trap179.

5

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 49: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figure B3: The estimated detection functions for the 2 proposed individuals in question.

6

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 50: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Estimating Animal Density Using Genetic MarkersObserved with Error

Ben C. Augustine, J. Andrew Royle, Daniel W. Linden, and Angela K. Fuller

January 1, 2020

Appendix C: Simulation Study DetailsWe evaluated estimator performance using a simulation study in which the data-generating param-eter values were set to those estimated in the fisher analysis, except for density which we set toa larger value to make resolving the individual identities of samples more challenging (Augustineet al. 2019). We set σ to 1.32 km and converted the fisher a0 estimate of 4.73 to a λ0 value of0.570. We quadrupled the fisher density estimate of 4.27/100 km2 to 17.08/100 km2. We specifieda 9 x 9 rectangular trapping array of 81 traps with a 2σ spacing of 2.64 km and a buffer of 3σ

to define the rectangular state space. The abundance implied by the density and state space areastated above was 166 individuals.

We simulated from the same genotype frequency and genotype observation models as used forthe fisher data set using the parameter estimates from that analysis. We simulated “high” and “low”quality samples in the proportion of 52:48, similar to that observed in the fisher data set. We setthe locus by replication amplification probability (not an explicit parameter in our model) to 0.995and 0.437 for high and low quality samples, respectively, matching that observed on average acrosseach sample type for the first 3 replicated assignments in the fisher data set. We used the parameterestimates for γ, phom−high, phom−low, phet−high, phet−low from the fisher data set to simulate from.We considered a maximum of nrep = 3 replicated assignments, fewer than the maximum of 7 in thefisher data set, though the probability of loci amplification declined after 3 replicated assignmentsfor the fisher data set.

We simulated 120 data sets from the Genotype SPIM and fit the Genotype SPIM model using1, 2, and 3 of the replicated assignments, with and without the low quality samples. Then, we fita regular SCR model using only the “high quality” samples, to roughly mimic what happens inpractice where only high confidence samples are retained, though we assumed a best case scenariowhere there were no errors in individual identity assignment. For all models, we ran 2 chainsof 100,000 iterations, discarding 5,000 iterations as burn in and thinning by 50 to reduce outputfile sizes. We used the posterior modes for point estimates except for parameters that sum to 1(e.g., genotype observation probabilities) for which we used the posterior means. We used 95%percentile intervals for interval estimates, which had better coverage than HPD intervals for ncap

because it is a discrete parameter with very low uncertainty leading to very narrow intervals. Be-cause all 4 of these estimators should be approximately unbiased, we used the posterior standard

1

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 51: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

deviations and mean 95% CI widths to compare precision. We used the mean squared error tocompare accuracy. The uninformative priors listed in Appendix A were used for all parameters.

The analysis time for the simulated data sets was much less than for the fisher applicationbecause we did not consider individual heterogeneity in detection function parameters; we alsoconsidered a much lower abundance and number of traps, though the expected number of detec-tions per individual was higher. A run of 100,000 iterations took roughly 8 hours, while the samenumber of iterations for the fisher application took roughly 4 days.

Table 1: Simulation settings for baseline detection rate, λ0, detection function spatial scale, σ ,abundance, N, number of replicated genotype assignments, nrep, and whether or not to use the lowquality samples with lower amplification rates and higher error rates.

Scenario λ0 σ N nrep Use Low Quality Samples

SPIM3A 0.570 1.320 166 3 YesSPIM2A 0.570 1.320 166 2 YesSPIM1A 0.570 1.320 166 1 YesSPIM3B 0.297 1.320 166 3 NoSPIM2B 0.297 1.320 166 2 NoSPIM1B 0.297 1.320 166 1 No

SCR 0.297 1.320 166 . No

ReferencesAugustine, B. C., J. A. Royle, S. M. Murphy, R. B. Chandler, J. J. Cox, and M. J. Kelly. 2019.

Spatial capture–recapture for categorically marked populations with an application to geneticcapture–recapture. Ecosphere, 10:e02627.

2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 52: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Estimating Animal Density Using Genetic MarkersObserved with Error

Ben C. Augustine, J. Andrew Royle, Daniel W. Linden, and Angela K. Fuller

December 13, 2019

Appendix D: Black Bear ApplicationHere, we will provide an application of the Genotype SPIM to a data set with no replicated as-signments (only 1 PCR), or at least any replicated assignments were not provided by the lab thatdid the genotyping (Wildlife Genetics International; WGI). WGI uses the methods described inPaetkau (2003) where 1/3 of the DNA product of each sample is amplified in a single PCR. Poorlyperforming samples are then culled and any remaining samples with 1–2 mismatched pairs arereanalyzed. Therefore, we do not know how many replicated assignments there were for eachsample. We will ignore this problem in the analysis here, admitting that it amounts to a misspecifi-cation of the genotyping error model, though this may not be of great consequence. Since we willestimate that the certainty in individual identity is 100% for the high quality samples in this dataset, we are really only concerned with the number of times the poor quality samples were geno-typed. It is likely that the majority of these samples were genotyped only once and then deemedunreliable because they produced partial or no genotypes. This application is presented as a proofof concept that the Genotype SPIM can work with no replicated assignments for data sets collectedin practice.

Survey DescriptionThis data set comes from Murphy et al. (2016). We will describe the most pertinent details ofthe survey here and direct readers to the original publication for the full methods. This data setcomes from the second year at one site of a 3 year (2011–2013), 4 site hair snare survey for blackbears along the Kentucky-Virginia, USA border. We chose to use the site that encompasses partsof Pine and Black Mountains because it had the larger spatial extent of traps, encompassed theprimary core of the population, and thus produced more captures and spatial recaptures. We chosethe second year (2012) because it had the most samples that were not assigned individual identitiesby WGI.

The survey effort consisted of 81 hair snares with an average spacing of 1.6 km, operated over8 consecutive 1-week long capture occasions. A total of 340 samples were collected and sent toWGI for microsatellite genotyping. To reduce the costs of genotyping, only 1 sample per trap peroccasion was selected for genotyping, following sample randomization within each trap x occasion

1

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 53: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

combination. A total of 179 samples remained after subsampling, for which genotyping was at-tempted using microsatellite markers G10H, G10L, G10M, MU23, G10J, G10B, G10P, and a sexmarker. One hundred fifty-four samples were assigned an individual identity (63 individuals total),leaving 25 samples that were unassigned to individual. Of these originally discarded samples, 18provided at least 1 scored locus, leaving 7 with no partial identity information at all, other than thelocation where the sample was collected. There were 1, 1, 1, 2, 6, and 7 samples with 6, 5, 4, 3,2, and 1 scored loci, respectively. In total, there were only 40 observed loci among the 18 partialgenotype samples, or on average, 2.2 loci per sample. This amounted to a 16% increase in totalsamples and a 3% increase in samples with 3 or more loci.

Statistical MethodsAs with the Fisher application in the main text, we fit the Genotype SPIM using all 179 samplesand fit a regular SCR model using the 154 samples assigned individual identities by WGI for com-parison. We used a Bernoulli observation model instead of the Poisson because the subsamplingprocess for the hair samples ensured that a maximum of 1 sample per individual could be collectedper trap on each occasion. Treating these subsampled detection data as Bernoulli was previouslydemonstrated via simulation to be appropriate by Murphy et al. (2016), which differs from typi-cal single-catch traps because the capture order of individuals does not influence the subsamplingprotocol. For both the Genotype SPIM and SCR models, we used the individual heterogeneitydetection function model described in the main text and we used a Gamma(3,7) prior for σ sd ,the parameter determining the level of individual-level variability in σi and λi. For the GenotypeSPIM, we split samples into “high” and “low” quality categories, determined by whether or notWGI assigned them an individual identity, and estimated separate genotype observation parametersfor each quality category.

Unlike the fisher application in the main text, we included the sex marker as a partial identitycovariate that could be observed with error. We used the same genotyping error model for thiscovariate as we did for the other microsatellite markers. This treatment introduces a misspecifica-tion of the genotyping error process for the sex marker, but it should have a negligible impact onthis analysis for two reasons. First, the female sex marker has a value of 250.250, while the malemarker has a value of 204.250. We assumed that the value 204.204 was also in the population, butbecause 204.204 was never observed, the frequency of this marker is estimated at <1%. Second,we assumed that the mechanisms that lead to allelic dropout and false alleles in microsatellites alsoapply to this marker, but this marker presumably has different error mechanisms and a lower rateof error. Thus, this misspecification likely decreases the estimated rate of allelic dropout and falseallele events which apply to the other microsatellites. However, the high quality samples were es-timated to be genotyped with near certainty and only one poor quality sample was sexed, limitingthe effect of this misspecification for the low quality samples of interest.

For the regular SCR model, we ran 3 chains for 200,000 iterations, thinned by 50, and discardedthe first 5000 samples of each chain as burn in. For the Genotype SPIM, we ran 1 chain for 800,000iterations, thinned by 50, and discarded the first 325,000 as burn in. We present posterior modesas point estimates, except for parameters that sum to 1 (e.g., genotype observation probabilities),where we used the posterior mean. We present 95% highest posterior density (HPD) intervals asinterval estimates. We use the coefficient of variation (CV) to compare the precision of each model,which we define as the posterior standard deviation divided by the posterior mode.

2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 54: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

ResultsThe Genotype SPIM and SCR models produced nearly equivalent estimates for abundance (FigureD1, Table D1). The Genotype SPIM abundance point estimate was 2.7% lower than the SCRestimate, with a CV that was 1.1% lower, demonstrating that the partial genotype samples veryslightly improved precision. Including the low quality samples in the Genotype SPIM increasedthe overall detection parameter, a0 from 1.42 to 1.81 and the estimated number of individualscaptured was raised to 69, compared to the 63 available in the high quality samples. The GenotypeSPIM produced slightly more precise estimates of the both spatial scale parameters, σ and σ sd ,likely due to the increased number of spatial recaptures. There were thirty-five spatial recapturesin the high quality samples used in the SCR analysis and an estimated 50 when including thelow quality samples in the Genotype SPIM analysis. The minimum value of the Genotype SPIMposterior for the number of spatial recaptures was 43, indicating that the poor quality samplesadded 7 spatial recaptures with near certainty.

The high quality samples were estimated to have a nearly perfect genotype correct observationprobability, and the low quality samples were estimated to be relatively reliable, but with a largeamount of uncertainty (Figure D1, Table D2). We compare 12 of the 18 partial genotype sampleswith their highest posterior probability matches to complete genotype samples in Table D3. Fivepartial genotype samples were matched with individuals in the complete genotype samples witha probability of greater than 0.97 (Table D3). These samples were scored at 2-6 loci. Generally,samples scored at 2 or fewer loci did not provide high probability matches. Samples with 1 or2 scored loci were more likely to match more than one individual with similar probabilities (e.g.,samples 8, 18, and 19). Of the 12 samples displayed in the table, 11 did not contain any genotypingerrors when associated with their highest probability matches. Partial genotype sample 147 wasscored at 4 loci and did not match any complete genotype individual at all loci. It did match acomplete genotype individual at 2 loci, which was captured at the same trap. This assignment wasgiven a 0.20 probability and implied 2 false alleles, though the event of a correct assignment of nomatch was assigned a higher probability 0.80.

DiscussionWe demonstrated that the Genotype SPIM can work with no replicated assignments for a real worlddata set. In this case, there was only a negligible improvement in the inference about abundance;however, this data set contained very minimal information in the added low quality samples–only40 scored loci in total and only a 3% increase in samples when considering the more informativesamples scored at 3 or more loci. Further, the paucity of partial genotype scores led to very im-precise estimates of the poor quality sample genotype observation probabilities (Figure D2). It islikely for this reason that sample 147 had a non-negligible posterior probability (0.20) of matchinga complete genotype sample with 2 false allele scores captured at the same trap. Interestingly, thepresumably correct assignment of no match was more probable (0.80). If no genotype informationwas available at all as is the case with unmarked SCR (Chandler and Royle 2013), it is likely theincorrect assignment would be made with a higher probability as these two samples were the onlyones recorded at this trap. We expect that if more partial genotype samples were available, the falseallele probability would be estimated more precisely around a very low value and this presumablyincorrect match would be less likely or ruled out. Still, we believe the Genotype SPIM assign-

3

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 55: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

ments like this made with very imprecise error probability estimates will be an improvement overa spatial partial identity model that discards this information (e.g., the “random thinning” modeldiscussed in the main text).

The high quality samples for this bear data set were estimated to be scored correctly withnear certainty, compared to a 0.185 estimated allelic dropout rate per replicated assignment for thefisher data set. In fact, this bear data set was estimated to contain 0 errors with probability 1. Whilethere may have been differences in the DNA quantity and quality between the fisher and bear hairsamples, the lab methods used by WGI (Paetkau 2003) likely reduced the error rate by includingmore DNA product in the first and often only PCR, compared to the modified multi-tubes approach(Frantz et al. 2003) used for the fisher data set, requiring the DNA product to be split across upto 7 PCRs (Waits and Paetkau 2005). We expect the multi-tubes approach to yield more reliableestimates from the Genotype SPIM due to the explicit replicated assignments; however, it is moreexpensive (Waits and Paetkau 2005). The reliability of the Genotype SPIM estimates using datafrom WGI or other labs using the methods of Paetkau (2003) could be improved by using thereplicated assignment data for the samples where multiple PCRs were done.

AcknowledgementsWe thank Sean Murphy and John Cox for making this black bear data set available for us to use.

4

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 56: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figures

Figure D1: SCR process and observation model posterior distributions from the SCR and SPIManalyses. a0 is the overall detection parameter, σ is the population-level detection function spatialscale parameter in km, σ sd is the standard deviation of the individual-level variance in the spatialscale parameter, n is the number of individuals captured, and N is the population abundance. Thenumber of captured individuals and spatial recaptures in the SCR analysis are known statistics.

5

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 57: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figure D2: Heterozygous single locus genotype observation probability posterior distributions forlow quality samples. See Table 2 for full observation probability results.

Tables

Table D1: SCR process and observation model parameter estimates from the SCR and SPIM anal-yses. a0 is the overall detection parameter, σ is the population-level detection function spatial scaleparameter in km, σ sd is the standard deviation of the individual-level variance in the spatial scaleparameter, N is the population abundance, n is the number of individuals captured, and D is thepopulation density (individuals/100km2). Posterior modes are presented as point estimates, pos-terior standard deviations/posterior modes are presented as the coefficient of variation, and 95%HPD interval upper and lower bounds are presented as interval estimates.

SCR SPIM

Est CV LB UB Est CV LB UB

a0 1.42 18.2 1.07 2.06 1.86 17.0 1.35 2.57σ 1.14 28.7 0.61 1.90 1.20 27.5 0.63 1.90

σ sd 0.75 38.1 0.44 1.49 0.86 30.3 0.49 1.46N 294 18.1 203 406 279 18.1 203 397n 63 68 2.7 64 71D 17.99 18.1 12.43 24.86 17.08 18.1 12.49 24.37

6

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 58: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Table D2: Single locus genotype observation probability parameter estimates for high and lowquality samples. “Het” indicates a heterozygous single locus genotype and “hom” indicates ahomozygous single locus genotype. “Correct”, “AD”, and “FA” indicate a correct, allelic dropout,and false allele observation, respectively. Posterior modes are presented as point estimates and95% HPD interval upper and lower bounds are presented as interval estimates.

High Quality Low Quality

Class Type Est LB UB Est LB UB

Het-Correct 0.997 0.993 1.000 0.870 0.722 0.995Het-AD 0.001 0.000 0.004 0.040 0.000 0.117Het-FA 0.001 0.000 0.004 0.090 0.000 0.228

Hom-Correct 0.996 0.988 1.000 0.910 0.745 1.000Hom-FA 0.004 0.000 0.012 0.090 0.000 0.255

7

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 59: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Table D3: Twelve of the 18 partial genotype samples with compared to their highest posterior prob-abilities of matches among the complete genotype samples. The observed values at 7 microsatelliteloci (L1 - L7) and the sex marker for the partial genotype sample are listed followed by the valuesfor the complete genotype samples they matched with. P(match) is the posterior probability of thepairwise match between the partial and complete genotype sample. Mismatches are colored in red.

Samp L1 L2 L3 L4 L5 L6 L7 Sex P(match)

14 158.158 135.135 231.245 187.199 155.155 . 201.201 .158.158 135.135 231.245 187.199 155.155 212.212 201.201 250.250 1.00

64 . 135.149 241.245 187.203 157.159 210.216 . .154.154 135.149 241.245 187.203 157.159 210.216 191.195 250.250 1.00

178 . . 245.245 187.195 . . 187.205 .154.158 147.155 245.245 187.195 151.159 206.214 187.205 204.250 1.00

88 . . . 191.199 155.157 . . .158.158 147.147 241.241 191.199 155.157 206.212 187.197 204.250 0.99

54 . . . 187.199 159.159 212.216 . .158.164 147.155 241.255 187.199 159.159 212.216 195.205 204.250 0.97

146 . . 239.245 187.199 . . . .158.160 147.147 239.245 187.199 155.159 206.216 187.197 204.250 0.73

95 . . . 187.199 . . . 250.250158.160 135.159 237.241 187.199 155.159 212.216 187.201 250.250 0.55

125 . . . 187.199 159.159 . . .154.164 135.153 231.251 187.199 159.159 208.210 187.187 204.250 0.57

18 . . . 187.203 . . . .154.154 135.149 241.245 187.203 157.159 210.216 191.195 250.250 0.50154.156 135.157 241.245 187.203 159.159 210.210 187.187 250.250 0.48

8 . . . 187.199 159.163 . . .154.160 137.147 241.245 187.199 159.163 206.216 187.201 250.250 0.42158.160 135.153 237.241 187.199 159.163 212.214 187.201 204.250 0.38

19 . . . . 159.159 . . .154.156 135.157 241.245 187.203 159.159 210.210 187.187 250.250 0.35156.156 147.153 237.245 195.199 159.159 212.216 187.189 204.250 0.20154.158 149.149 245.251 187.201 159.159 210.216 187.187 204.250 0.13

147 . . 241.245 187.199 155.159 206.214 . .154.158 147.147 241.245 187.191 159.163 206.214 187.201 250.250 0.20

8

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 60: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

ReferencesChandler, R. B. and J. A. Royle. 2013. Spatially explicit models for inference about density in

unmarked or partially marked populations. The Annals of Applied Statistics, 7:936–954.

Frantz, A., L. Pope, P. Carpenter, T. Roper, G. Wilson, R. Delahay, and T. Burke. 2003. Reliablemicrosatellite genotyping of the eurasian badger (meles meles) using faecal dna. MolecularEcology, 12:1649–1661.

Murphy, S. M., J. J. Cox, B. C. Augustine, J. T. Hast, J. M. Guthrie, J. Wright, J. McDermott, S. C.Maehr, and J. H. Plaxico. 2016. Characterizing recolonization by a reintroduced bear populationusing genetic spatial capture–recapture. The Journal of Wildlife Management, 80:1390–1407.

Paetkau, D. 2003. An empirical exploration of data quality in dna-based population inventories.Molecular ecology, 12:1375–1387.

Waits, L. P. and D. Paetkau. 2005. Noninvasive genetic sampling tools for wildlife biologists:a review of applications and recommendations for accurate data collection. The Journal ofWildlife Management, 69:1419–1433.

9

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 61: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Estimating Animal Density Using Genetic MarkersObserved with Error

Ben C. Augustine, J. Andrew Royle, Daniel W. Linden, and Angela K. Fuller

December 13, 2019

Appendix E: General Observation Error Model Simulations andAndean Bear ApplicationIn this Appendix, we provide a short simulation study of the catSPIM with general observation er-ror (catSPIM-OE) described in Appendix A, followed by an application to an Andean bear cameratrap data set where facial markings are used as categorical identity covariates.

Simulation Study

Simulation SpecificationsWe conducted a simulation study to demonstrate the performance of catSPIM-OE, compare it tocatSPIM without observation error, and illustrate the effects of varying the number of categoricalidentity covariates, ncat , and the number of replicated assignments, nreps. We replicated ScenarioA2 from Augustine et al. (2019), which was the most optimal scenario considered in terms ofexpected catSPIM performance. In this scenario, the values of population density and σ led tothe least amount of home range overlap across individuals considered and thus, the least amountof uncertainty in individual identity. We recreated this scenario using a 9 x 9 grid of traps withunit spacing and a 3σ . We set N = 39, λ0 = 0.25, σ = 0.5, and K = 10. We considered 2-52-level identity covariates, nlevels

l =2, for l ∈ (2,3,4,5) with correct observation probabilities foreach category level of 1, 0.9, 0.8, and 0.7. For example, for correct observation probability of0.9, πl11 = πl22 = 0.9 and πl21 = πl12 = 0.1. For the perfect classification rate, we fit the regularcatSPIM model. For the imperfect observation probabilities, we considered nreps ∈ (3,5,7). Thetwo covariate value frequencies for all identity covariates were set to 0.5, i.e., γl = (0.5,0.5). Thesespecifications led to 4 perfect observation scenarios and 48 imperfect observation scenarios.

We simulated 96 data sets for each of the 52 scenarios. Then, we ran 3 MCMC chains for100,000 - 150,000 iterations, depending on the MCMC mixing characteristics of each scenario,with chains thinned by 50 iterations, and discarded 12,500 iterations as burn in. We calculated theGelman-Rubin statistic, Rc (Gelman et al. 1992), using Rc < 1.1 for N to indicate convergence.We discarded simulated data sets that did not meet this criterion. Within each set of scenarioswith the same correct observation probability, if convergence was indicated for over 99% of the

1

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 62: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

data sets for a given level of nrep, we reverted to running a single chain for the next level of nrep

(i.e., if Rc < 1.1 for >99% of simulations with nreps=3, we assumed all chains would converge fornreps ∈ (5,7) and used only 1 chain). We used posterior modes for point estimates and 95% highestposterior density (HPD) intervals for interval estimates. Finally, when estimating the elements ofπ and γ, we pooled parameters across identity covariates l, so we discard the l indicator for π andγ in the results below.

Simulation ResultsThe abundance point estimates were approximately unbiased and the 95% credible intervals hadapproximately nominal coverage across all scenarios (Table E1). The precision of the abundanceestimates was higher when the correct observation probability was higher, when there were moreidentity covariates, and when there were more replicated assignments (Table E1, Figure E1). Theincreased precision from including additional identity covariates and replicated assignments wasmore pronounced when the correct observation probability was lower (Table E1, Figure E1). Pointestimates of the correct observation probabilities, π11 and π22, were approximately unbiased andinterval estimates had approximately nominal coverage except in some scenarios with the lowestcorrect observation probability (0.7) and with fewer replicated assignments (Table E2). The pointestimates for the category level probabilities were approximately unbiased in all scenarios andcoverage was close to nominal in most scenarios except those with the lowest correct observationprobabilities (0.7, Table E2).

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 63: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figure E1: Simulation results of abundance estimate accuracy and precision as a function of thecorrect observation probability, number of identity covariates, and number of replicated assign-ments. The top row displays box plots of the abundance point estimates and the bottom row dis-plays the mean 95% credible interval width. The red line in the point estimate box plots indicatesthe true value of abundance, 39.

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 64: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Table B1: Simulation results for catSPIM-OE. π indicates the correct observation probability, ncat

indicates the number of identity covariates, and nreps indicates the number of replicated assign-ments. λ̂0 is the mean baseline detection point estimate, σ̂ is the mean detection function spatialscale estimate, N̂ is the mean abundance estimate, n̂ is the mean estimate of the number of indi-viduals detected, N Cov is the coverage of the 95% credible interval, N wid is the mean width ofthe 95% credible interval, n is the mean number of individuals detected across simulations, andRc is the proportion of simulated data sets for which the Gelman-Rubin convergence statistics was< 1.1.

π ncat nreps λ̂0 σ̂ N̂ n̂ NCov

NWid

n Rc

1.0 2 1 0.253 0.495 40.6 19.0 0.94 40.5 19.0 1.001.0 3 1 0.245 0.491 40.0 18.6 0.96 33.5 18.6 1.001.0 4 1 0.254 0.487 39.4 18.1 0.96 29.7 18.3 1.001.0 5 1 0.251 0.491 39.4 18.3 0.94 28.3 18.6 1.000.9 2 3 0.246 0.488 39.2 18.3 0.97 44.5 18.5 0.990.9 3 3 0.248 0.491 38.1 17.7 0.96 34.2 18.2 1.000.9 4 3 0.252 0.486 39.0 18.2 0.97 31.6 18.4 1.000.9 5 3 0.253 0.487 39.6 18.5 0.97 30.0 18.6 1.000.9 2 5 0.247 0.492 39.3 18.4 0.98 39.3 18.6 .0.9 3 5 0.250 0.490 39.3 18.4 0.94 32.8 18.6 .0.9 4 5 0.255 0.486 39.1 18.2 0.94 30.0 18.4 .0.9 5 5 0.251 0.489 39.1 18.2 0.92 28.5 18.4 .0.9 2 7 0.245 0.484 40.2 18.6 0.97 41.9 18.5 .0.9 3 7 0.261 0.480 39.4 18.3 0.97 32.2 18.4 .0.9 4 7 0.250 0.489 39.7 18.3 0.96 29.7 18.5 .0.9 5 7 0.242 0.499 38.7 18.1 0.95 28.0 18.4 .0.8 2 3 0.241 0.484 39.4 18.6 0.99 66.0 18.7 0.880.8 3 3 0.249 0.484 39.2 18.3 0.97 55.0 18.3 0.940.8 4 3 0.257 0.479 39.0 18.2 0.97 40.0 18.3 0.990.8 5 3 0.244 0.486 39.2 18.3 0.95 36.5 18.5 0.990.8 2 5 0.252 0.488 39.4 18.4 0.98 54.2 18.7 0.950.8 3 5 0.252 0.482 38.5 17.9 0.94 38.4 18.0 1.000.8 4 5 0.253 0.486 39.0 18.1 0.95 33.5 18.3 0.990.8 5 5 0.252 0.488 39.7 18.6 0.97 31.1 18.7 1.000.8 2 7 0.254 0.492 38.6 18.2 0.97 45.1 18.4 .0.8 3 7 0.252 0.487 39.2 18.4 0.99 34.9 18.4 .0.8 4 7 0.247 0.496 39.9 18.7 0.96 31.6 18.9 .0.8 5 7 0.249 0.494 38.7 18.1 0.98 29.4 18.1 .0.7 2 3 0.265 0.465 39.2 19.0 0.96 141.2 18.4 0.780.7 3 3 0.255 0.473 39.6 18.8 0.94 100.7 18.6 0.850.7 4 3 0.238 0.509 39.1 18.5 0.96 78.8 18.5 0.860.7 5 3 0.249 0.481 39.6 18.6 0.98 64.6 18.4 0.880.7 2 5 0.250 0.482 38.1 18.1 0.96 110.2 18.1 0.760.7 3 5 0.241 0.488 39.7 18.6 0.97 69.0 18.4 0.900.7 4 5 0.253 0.478 39.3 18.5 0.97 51.9 18.3 0.950.7 5 5 0.239 0.489 39.0 18.3 0.98 44.5 18.3 0.970.7 2 7 0.248 0.486 39.0 18.6 0.95 79.0 18.4 0.920.7 3 7 0.242 0.484 40.2 18.9 0.99 52.8 18.8 0.970.7 4 7 0.249 0.488 39.0 18.4 0.95 42.0 18.4 0.990.7 5 7 0.254 0.485 38.1 17.8 0.94 36.0 18.3 1.00

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 65: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Table B2: Continued simulation results for catSPIM-OE. This table contains point and intervalestimate summaries for pi and γ . π indicates the correct observation probability, ncat indicates thenumber of identity covariates, and nreps indicates the number of replicated assignments. π̂11 is themean point estimate for the probability category level 1 is correctly classified, and π̂22 is the meanpoint estimate for the probability category level 2 is correctly classified. These are followed by thecoverage of the 95% credible interval for each. γ̂1 is the mean point estimate for the first categorylevel probability and γ̂2 is the mean point estimate for the second category level probability. Theseare followed by the coverage of the 95% credible interval for each.

π ncat nrep π̂11 π̂22 π11Cov

π22Cov

γ̂1 γ̂2 γ1Cov

γ2Cov

0.9 2 3 0.90 0.90 0.93 0.95 0.52 0.50 0.99 0.990.9 3 3 0.90 0.90 0.94 0.93 0.50 0.50 0.97 0.970.9 4 3 0.90 0.90 0.92 0.94 0.50 0.50 0.98 0.980.9 5 3 0.90 0.90 0.94 0.97 0.49 0.51 0.95 0.950.9 2 5 0.90 0.90 0.95 0.96 0.50 0.50 0.96 0.960.9 3 5 0.90 0.90 0.94 0.95 0.50 0.50 0.94 0.940.9 4 5 0.90 0.90 0.95 0.96 0.50 0.50 0.90 0.900.9 5 5 0.90 0.90 0.92 0.92 0.50 0.50 0.94 0.940.9 2 7 0.90 0.90 0.95 0.90 0.49 0.51 0.88 0.880.9 3 7 0.90 0.90 0.92 0.92 0.50 0.51 0.94 0.940.9 4 7 0.90 0.90 0.96 0.94 0.51 0.49 0.97 0.970.9 5 7 0.90 0.90 0.97 0.96 0.50 0.50 0.96 0.960.8 2 3 0.79 0.79 0.97 0.93 0.49 0.51 1.00 1.000.8 3 3 0.79 0.79 0.92 0.91 0.50 0.50 1.00 1.000.8 4 3 0.80 0.80 0.94 0.95 0.50 0.50 0.99 0.990.8 5 3 0.80 0.80 0.92 0.94 0.50 0.50 1.00 1.000.8 2 5 0.80 0.80 0.92 0.95 0.49 0.51 0.98 0.980.8 3 5 0.79 0.80 0.95 0.95 0.50 0.50 0.97 0.970.8 4 5 0.80 0.80 0.95 0.89 0.50 0.50 0.98 0.980.8 5 5 0.80 0.80 0.94 0.96 0.50 0.50 0.97 0.970.8 2 7 0.80 0.80 0.94 0.94 0.50 0.51 0.96 0.960.8 3 7 0.80 0.80 0.96 0.97 0.50 0.50 0.94 0.940.8 4 7 0.80 0.80 0.97 0.94 0.50 0.51 0.94 0.940.8 5 7 0.80 0.80 0.91 0.96 0.51 0.49 0.94 0.940.7 2 3 0.57 0.55 0.84 0.85 0.50 0.50 1.00 1.000.7 3 3 0.65 0.64 0.90 0.90 0.50 0.50 1.00 1.000.7 4 3 0.68 0.67 0.93 0.90 0.50 0.50 1.00 1.000.7 5 3 0.69 0.68 0.97 0.96 0.50 0.50 1.00 1.000.7 2 5 0.66 0.66 0.89 0.94 0.50 0.50 1.00 1.000.7 3 5 0.69 0.69 0.94 0.89 0.50 0.51 1.00 1.000.7 4 5 0.70 0.70 0.93 0.92 0.50 0.50 1.00 1.000.7 5 5 0.70 0.70 0.95 0.96 0.50 0.50 1.00 1.000.7 2 7 0.69 0.70 0.93 0.96 0.50 0.50 1.00 1.000.7 3 7 0.70 0.70 0.94 0.93 0.49 0.51 1.00 1.000.7 4 7 0.69 0.70 0.94 0.92 0.49 0.51 0.99 0.990.7 5 7 0.70 0.70 0.97 0.94 0.50 0.50 1.00 1.00

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 66: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Andean Bear Application

Data DescriptionHere, we describe an example of how the categorical Spatial Partial Identity Model with observa-tion error (catSPIM-OE) can be applied to camera trap data using an Andean bear camera trap dataset from Ecuador. This specific application did not work well, but we describe the general method-ology and results to inform potential future applications to other camera trap data sets which maywork better. The full details of this data set can be found in Springer (2018)–we will only de-scribe the details relevant for this analysis here. The study was conducted in the Ecuadorian Andesacross an area of 805 km2. The study area was divided into 1 km2 grid cells, with camera stationsplaced in 70 grid cells. Two cameras were deployed at camera stations in 31 grid cells, and onecamera was deployed at camera stations in 39 grid cells. All camera stations were baited with avanilla scent lure to increase the time individuals spent in front of cameras. These 70 camera sta-tions were operated for 106 days. Photographs collected within a 5 minute period were assumedto be of the same individual and classified as a capture event. Capture events within 30 minutesof another capture event were discarded to reduce the probability that events from the same bearwere counted multiple times, violating basic capture-recapture independence assumptions. Theseguidelines produced 139 capture events from an unknown number of Andean bears.

Image ProcessingCategorical identity covariates were produced from each capture event using volunteers to digitizethe visible facial and neck markings across all photographs within a capture event using AdobePhotoshop. Volunteers were provided with a bear face template (Figure E1b) and asked to fillin the areas of the face and neck which contained marks with white and areas not visible withgray (e.g., Figure E1c). Areas not colored by the volunteers were black, indicating no markswere present in these areas. A total of 9 volunteers digitized the bear markings, with 3 observersclassifying each capture event. After all facial and neck markings were digitized by multipleobservers in continuous space, categorical identity covariates were created by discretizing the faceinto grid cells. The brow region was identified as the area of the face that varied the most acrossindividuals and where there was the most agreement across observers (Figure E2). The muzzleregion was almost always classified as marked because almost all bears had markings in this areaand agreement between observers was high because these markings were the easiest to digitizeaccurately. Conversely, almost all bears had markings in the neck region, but the observers variedconsiderably in whether or not they actually digitized these areas and agreement was lower becausethis region was difficult to digitize accurately. Thus, we focused on the brow region. We used 3, 5,10, and 20 grid cells to represent the brow markings, but only present results using 10 grid cells.If a grid cell intersected an area colored white, it was classified as “2” (marked). If a grid cellintersected an area colored black, but no areas colored white, it was classified as “1” (unmarked).If a grid cell intersected an area colored gray and no areas colored white, it was classified as “0”,indicating a missing observation.

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 67: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

CatSPIM-OE ModelThe image processing described above produced a data set of 139 capture locations stored in yobs

and the observed categorical identity covariates stored in Gobs of dimension 139 x 10 x 3 (10categorical identity covariates and 3 observer classifications). We assumed that each categoricalidentity covariate had its own probability of containing a mark across individuals (γl estimatedseparately for all l); however, we fixed the category level observation probabilities, π across iden-tity covariates due to data sparsity. Because there are only two possible identity covariate values(marked or not marked), there were 4 observation probabilities to be estimated,

π =

[π11 π12π21 π22

]These probabilities correspond to correctly classifying an unmarked area as unmarked (π11), cor-rectly classifying a marked area as marked (π22), incorrectly classifying an an unmarked area asmarked (π21) and incorrectly classifying a marked area as unmarked (π12).

We fit the null model described in the main text that assumes the detection function parametersdo not vary across individuals, but also tried models with individual heterogeneity in detectionfunction parameters, and/or a behavioral response to capture. We also considered uninformative(Uniform(0,∞)) and various informative priors for σ . None of these models yielded plausibleresults, so we just describe the null model specifications with an uninformative prior here. We ran1 MCMC chain for 100,000 iterations, discarding the first 50,000 as burn in. We used posteriormodes for point estimates and 95% HPD intervals for interval estimates.

Results and DiscussionWe estimated that we captured 38 individuals in the 139 capture events and estimated a density of10.69 individuals/100km2 (Table E1) Further, we estimated σ at 0.63 km. We can compare theseestimates to previous estimates from a subset of this study area using SCR where the photographedindividuals were presumably easier to identify confidently (Molina et al. 2017). Molina et al.(2017) estimated a lower density at 7.45 bears/100km2 and a substantially larger σ at 2.8 km. Theσ estimate from Molina et al. (2017) is more in line with σ estimates from other bear populations.We believe we substantially underestimated σ and thus overestimated density. If so, this resultwould be consistent with previous simulations of unmarked SCR showing an underestimation of σ

and overestimation of density when there is a high level of home range overlap among individuals(Augustine et al. 2019). A second line of evidence that we overestimated density is that we wereable to identify 11 individuals from their facial markings with a relatively high level of confidenceand we subjectively estimate there were probably 15 - 25 observed bears in the 139 capture events,not the 38 we estimated. Therefore, we do not view these parameter estimates as plausible.

The population level category level frequencies and category level observation probabilities;however, were plausible. The population frequency with which each of the 10 facial regions con-tained a mark were estimated from 0.174 to 0.716 and roughly corresponded with the pattern seenin the composite bear face drawing in Figure E2. Generally, areas closer to the center of the browand lower on the brow had a higher probability of containing a mark than the outer and upper browregions. We estimated that if a face region had no mark, it was classified as having no mark with

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 68: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

high probability (0.97, Table E2). However, we estimated that observers less consistently recordedface regions with marks as having marks (0.74).

It is difficult to pinpoint one specific reason this application did not work as expected. Wespeculate that perhaps the largest two problems were that there was not enough variation in facialmarkings between bears and observers could not record them reliably enough. Multiple bearshad no facial markings at all and others may have the same values in all facial regions whendiscretizing the face into just a few number of regions. Further, the photos were often blurry dueto the low level of light under the forest canopy. Finally, there is a negative relationship betweenthe number of facial regions used and the agreement between observers. As the face is brokeninto smaller regions, the observers are more likely to disagree about which facial regions containmarks. Therefore, this system may not have a high enough signal to noise ratio to reliably extractinformation about individual identity.

A second factor that might have prevented this application from working well is a violation ofthe SCR observation model. Individual heterogeneity in detection function parameters has beenidentified as a problem for spatial partial identity models (SPIM) such as this one (Augustine et al.2018, 2019) and in our experience, including a behavioral response to capture can be requiredto correctly estimate σ and the number of individuals captured in SPIMs. We attempted to fitthese models, but this data set was much to sparse for reliable estimation of these more complexmodels. We also suspect that there was a high level of variability in the detection rate across sites,with some sites recording many photographs of multiple bears, while other nearby sites recordedno photographs. In effect, this is caused by missing site covariates, perhaps related to fine scalehabitat quality, or variability in bear use along different trails. We expect regular SCR is relativelyrobust to missing site covariates, but SPIMs will be less robust as it will tend to break the capturesof single individuals into two or more individuals.

A third factor that might have prevented this application from working well is a violation of theface region identity covariates and/or their observation probabilities. In particular, the face regioncovariates are likely not independent as we assumed. For example, if a bear has markings at theedges of the brow, they almost certainly have markings in the center of the brow. Thus, we mayneed to consider spatial correlation in the mark patterns. Finally, there is likely heterogeneity inthe category level observation probabilities across the face region covariates, which we could notaccommodate with such a sparse data set and small number of observers. Observers may be morelikely to agree about markings in the center of the brown than at the edges.

Despite the poor performance of this model for this application, we expect that there maybe other camera-based applications that may work well, specifically ones where there is morevariability in markings across individuals and a higher signal to noise ratio. Ideally, researcherswill have independent density estimates for the same study area to which the SPIM estimates canbe compared to increase confidence that the SPIM estimates are reliable.

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 69: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figures

(a) (b) (c)

Figure E1: An example digitization of Andean bear face covariates. Figure B1a is a single photo-graph capture event, Figure B1b is the template used for digitizing marks, and Figure B1c is thedigitized marks from one observer.

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 70: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Figure E2: A composite of all bear face drawings across the 139 capture events with the 10 facialregions used as categorical identity covariates depicted in red. Darker areas indicate areas moreconsistently drawn as “marked” by observers across capture events. The numbers for the facialregions match those in Table E2 where the estimated probabilities of containing a mark are pre-sented. All 10 facial regions were classified as “marked” in at least some drawings, despite thecomposite drawing not depicting marks in some regions due to the relative rarity of these events.

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 71: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Tables

Table E3: Detection function, abundance, and observation probability parameter estimates for theAndean bear analysis. λ0 is the baseline detection rate, σ is the detection function spatial scaleparameter, n is the number of individuals captured, N is abundance, D is density, and πr,c is theprobability that identity covariates taking value c are recorded as r. Point estimates are posteriormodes and “LB” and “UB” are the 95% HPD interval lower and upper bounds.

Estimate LB UB

λ0 0.07 0.05 0.10σ 0.63 0.53 0.77N 225 138 349n 38 33 41D 10.69 6.65 16.67

π11 0.97 0.96 0.98π21 0.03 0.02 0.04π12 0.26 0.23 0.29π22 0.74 0.71 0.77

Table E4: The estimated population frequencies with which each of the 10 facial regions containeda mark. ID Cov # is the facial region number depicted in Figure B2, and γl2 is the estimatedfrequency with which facial region l took value 2 (marked).

ID Cov # γ̂l2

1 0.1732 0.3723 0.4164 0.1545 0.4146 0.2227 0.4148 0.4919 0.13010 0.716

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 72: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

ReferencesAugustine, B. C., J. A. Royle, M. J. Kelly, C. B. Satter, R. S. Alonso, E. E. Boydston, and K. R.

Crooks. 2018. Spatial capture-recapture with partial identity: an application to camera traps.Annals of Applied Statistics, 11.

Augustine, B. C., J. A. Royle, S. M. Murphy, R. B. Chandler, J. J. Cox, and M. J. Kelly. 2019.Spatial capture–recapture for categorically marked populations with an application to geneticcapture–recapture. Ecosphere, 10:e02627.

Gelman, A., D. B. Rubin, et al. 1992. Inference from iterative simulation using multiple sequences.Statistical science, 7:457–472.

Molina, S., A. K. Fuller, D. J. Morin, and J. A. Royle. 2017. Use of spatial capture–recapture toestimate density of andean bears in northern ecuador. Ursus, 28:117–126.

Springer, V. 2018. Occupancy and co-occurrence of carnivores in the Ecuadorian Andes. Master’sthesis, Cornell University, Ithaca, NY, USA.

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 73: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

Selected Detailed Sample Match InformationBen AugustineDecember 21, 2019

Assignments that differed from the geneticistHere, we will look at the five Genotype SPIM identity assignments that differed from those made by the geneticist. We willconsider a posterior probability match of <0.99 as differing from the certain assignment made by the geneticist. Note, thatthe scoring rules used in the original study (Linden et al. 2017) were that a homozygote must be seen 3 times to be calledand a heterozygote must be seen 2 times to be called.

Sample 3This sample assignment only differed from that made by the geneticist because the geneticist used the sex information ofthe samples, which we did not use. However, this example is instructive of how the Genotype SPIM works.

Samples 2, 3, and 4 were assigned the same individual identity by the geneticist. The genotype SPIM assigns sample 3 tothe same individual as the individual that produced samples 2 and 4 with probability 0.73. We will label this individual asindividual A. The remainder of the match probability (0.27) is allocated to the event that sample 3 actually matches thesame individual as the individual that produced sample 1, also collected at the same trap as samples 2-4. We will label thisindividual as individual B.

If sample 3 belongs to individual A, there was most likely 4 allelic dropouts for sample 3 at locus 2 (136.138 to 138.138). Ifsample 3 belongs to individual B, there was most likely 4 allelic dropouts for sample 3 at locus 5 (132.144 to 144.144).Given the most likely genotypes for individuals A and B, these 2 possible sets of allelic dropouts for sample 3 are equallylikely under our genotyping error model. However, the assumed SCR observation model with compensation between λ0and σ allocates different probabilities to these two identity assignments. Because there are two samples from individual A(2 and 4) and only one sample from individual B (1), and these samples were collected at a single trap, the λ0i

for

individual B is lower (conversely, individual B has a larger σi), and thus sample 3 matches individual A with higherprobability than individual B under our model. This can be visualized in Plots 2 and 3 below showing the activity centerposterior more strongly concentrated around the trap of capture for individual A than individual B due to the differentcapture numbers.

However, the geneticist knew that the focal sample and samples 2 and 4 came from females and sample 1 came from amale so the correct assignment was made. Interestingly, the Genotype SPIM gives a higher match probability for thecorrect assignment than the incorrect assignment. This is consistent with the sex of this individual being female. Ifdetection is proportional to space use as our observation model assumes, the greater number of detections at a single trapfor the correct match leads to this individual having a larger estimated λ0i

and lower estimated σi as we would expect for

females which have smaller home range sizes than males.

The Genotype SPIM did not place any probability that all four of these samples came from the same individual. If so, therewould have been 8 allelic dropouts for samples 1 and 3 at locus 2, and 12 allelic dropouts for samples 2, 3, and 4 at locus5. This is a total of 20 allelic dropouts versus 4 if there were actually 2 individuals, making it much more likely there wereindeed 2 individuals. The sex information of the samples that we did not use confirms this was the case, unless there wereerrors in sex determination.

Loading [MathJax]/jax/output/HTML-CSS/jax.js

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 74: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [1] "Here is the data for the focal sample (3):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 158" "138 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 2 "154 158" "138 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 3 "154 154" "138 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 4 "154 158" "138 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 5 "" "" "207 213" "124 124" "" "" "" "281 281" "" ## rep 6 "" "" "207 213" "124 124" "" "" "" "281 281" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample always matched with at least one other sample"## ## [1] "Here is the data for candidate sample 1 (1):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 158" "138 138" "207 213" "124 124" "132 144" "204 204" "192 200" "281 281" "262 268"## rep 2 "154 158" "138 138" "207 213" "124 124" "132 144" "204 204" "192 200" "281 281" "262 268"## rep 3 "154 158" "138 138" "207 213" "124 124" "132 144" "204 204" "200 204" "281 281" "262 268"## rep 4 "154 158" "" "207 213" "124 124" "132 144" "204 204" "204 204" "281 281" "262 268"## rep 5 "" "" "" "" "" "" "204 204" "" "" ## rep 6 "" "138 138" "" "" "" "" "200 204" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.2745"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "138 138" "207 213" "124 124" "132 144" "204 204" "200 204" "281 281" "262 268"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ##

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 75: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 2 (2):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 158" "138 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 2 "154 158" "138 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 3 "154 154" "136 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 4 "154 154" "136 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.7255"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 3 (4):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 158" "136 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "268 268"## rep 2 "154 154" "136 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 262"## rep 3 "154 158" "138 138" "207 207" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## rep 4 "" "136 138" "213 213" "124 124" "144 144" "204 204" "200 200" "281 281" "262 262"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 76: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.7255"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 138" "207 213" "124 124" "144 144" "204 204" "200 204" "281 281" "262 268"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 77: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 78: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] 7 190 7 7

Sample 197The geneticist grouped sample 197 with samples 195, 196, and 198 because the consensus genotypes only mismatchedat 1 locus, locus 3. Sample 197 was scored as 213.215 at locus 3 across 4 replications, while the other 3 samples had aconsensus score of 205.213 at locus 3. Thus, the geneticist’s grouping implies a false allele assignment occurred insample 197 4 times; however, false alleles were estimated to be extremely rare in this data set. Therefore, the genotypeSPIM assigned a probability of 1 that sample 197 was a unique individual in the data set. This example illustrates that thetype of error (allelic dropout vs. false allele) that explains the mismatch between two samples and the number of times theerror occurred, can be very informative about whether the two samples belong to the same individual or not. Thisinformation is typically not used.

## [1] "Here is the data for the focal sample (197):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "158 158" "128 128" "213 215" "122 122" "144 144" "204 204" "192 192" "281 281" "" ## rep 2 "158 158" "128 128" "213 215" "122 122" "132 144" "204 204" "192 192" "281 281" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 79: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 3 "158 158" "128 128" "213 213" "122 122" "132 132" "204 204" "" "285 285" "" ## rep 4 "" "128 128" "213 215" "122 122" "132 144" "204 204" "" "281 285" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.997333333333333"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "128 128" "213 215" "122 122" "132 144" "204 204" "192 192" "281 285" "264 264"## [2,] "158 158" "128 128" "213 215" "122 122" "132 144" "204 204" "192 192" "281 285" "262 262"## [3,] "158 158" "128 128" "213 215" "122 122" "132 144" "204 204" "192 192" "281 285" "266 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.1486667 0.1208333 0.1005833## ## [1] "This sample did not match with any other samples with probability greater than 0.1"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 80: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] 5

## [1] "Here is the data for the focal sample (195):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 2 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 3 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 4 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "" "" "" ## rep 5 "" "128 128" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample always matched with at least one other sample"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 81: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "Here is the data for candidate sample 1 (196):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 2 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 3 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 4 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 281" "262 262"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 1"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 2 (198):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 281" "262 262"## rep 2 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## rep 3 "158 158" "128 128" "205 205" "122 122" "132 144" "204 204" "204 204" "281 281" "262 262"## rep 4 "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 82: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The focal sample matched this candidate with probability 1"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "128 128" "205 213" "122 122" "132 144" "204 204" "192 204" "281 285" "262 262"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 83: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 84: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] 5 5 5

Samples 281 and 282The geneticist grouped samples 281 and 282 together, which implies there were 4 allelic dropouts at locus 1 and 3 allelicdropouts at locus 3 for sample 282. The Genotype SPIM assigned a probability of 0.71 to this grouping, allowing a 0.29probability these samples belonged to 2 different individuals and there were no or fewer allelic dropouts. This exampleillustrates the fact that the number of allelic dropout events across replicate assignments that lead to locus mismatchesbetween samples provide critical information about the probability the two samples were produced by the same individual.Seven allelic dropout events under the state that the two samples came from one individual was enough to give someprobability that the two samples came from two different individuals without the allelic dropout events. If there was morereplication for sample 282 producing more homozygous scores, we would be more confident in the assignment of thissample to another individual than that which produced sample 281.

The assignment of a non-negligible probability that sample 282 was a distinct individual from 281 is also likely due to themismatching scores for sample 281 being relatively common genotypes. 156.156 at locus 1 had an estimated frequency of0.0523 and 207.207 at locus 3 had an estimated frequency of 0.0948. If these loci were scored as allelic dropouts leadingto genotypes that were much more rare in the population, Genotype SPIM would place much less probability that the twosamples came from different individuals. The estimated genotype frequencies can be found in the last section of thisdocument.

## [1] "Here is the data for the focal sample (282):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "156 156" "144 144" "207 207" "124 124" "132 144" "204 204" "192 192" "283 283" "258 258"## rep 2 "156 156" "144 144" "207 207" "124 124" "132 144" "204 204" "192 192" "281 281" "258 258"## rep 3 "156 156" "144 144" "207 207" "124 124" "144 144" "204 204" "192 192" "281 283" "258 258"## rep 4 "156 156" "144 144" "" "124 124" "132 132" "204 204" "192 192" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.28625"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "156 156" "144 144" "207 207" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 85: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [2,] "156 156" "144 144" "207 211" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"## [3,] "156 156" "144 144" "207 213" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.995648390 0.001450537 0.001160429## ## [1] "Here is the data for candidate sample 1 (281):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 156" "144 144" "207 211" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"## rep 2 "154 156" "144 144" "207 211" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"## rep 3 "156 156" "144 144" "207 211" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"## rep 4 "154 154" "144 144" "211 211" "" "132 144" "" "192 192" "" "258 258"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.71275"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "124 124" "132 144" "204 204" "192 192" "281 283" "258 258"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 86: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 87: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] 47 47

Samples 46 and 82The geneticist assigned different individual identities to samples 46 and 82, but the Genotype SPIM assigned them thesame identity with probability 1. The discrepancy between the samples that led the geneticist to assign different individualidentities is found at locus 6 where sample 82 was scored as an allelic dropout 4 times if these samples were in fact thesame individual. This is not an unlikely set of events because sample 82 is a “low quality” sample with a high allelicdropout probability. Both of these samples were scored as “male”, providing further support that these two samples camefrom the same individual.

## [1] "Here is the data for the focal sample (46):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "156 156" "144 144" "207 211" "122 124" "132 142" "204 204" "192 192" "281 283" "264 264"## rep 2 "154 156" "128 144" "207 207" "122 124" "132 142" "202 204" "192 204" "281 281" "264 264"## rep 3 "154 156" "128 144" "207 213" "122 122" "132 142" "204 204" "" "281 283" "264 264"## rep 4 "154 154" "" "207 211" "122 122" "" "202 204" "192 204" "" "" ## rep 5 "" "" "207 213" "124 124" "132 142" "" "" "" "" ## rep 6 "" "" "" "122 124" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.00166666666666667"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "128 144" "207 213" "122 124" "132 142" "202 204" "192 204" "281 283" "264 264"## [2,] "154 156" "128 144" "207 211" "122 124" "132 142" "202 204" "192 204" "281 283" "264 264"## [3,] "154 156" "128 144" "207 213" "122 124" "132 142" "202 204" "192 204" "281 283" "264 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.59090909 0.27272727 0.09090909## ## [1] "Here is the data for candidate sample 1 (82):"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 88: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 156" "144 144" "213 213" "124 124" "142 142" "204 204" "" "281 283" "264 264"## rep 2 "156 156" "144 144" "207 207" "122 124" "132 142" "204 204" "" "283 283" "" ## rep 3 "156 156" "128 144" "207 207" "122 122" "132 142" "204 204" "" "281 281" "264 264"## rep 4 "154 154" "128 144" "" "122 122" "132 132" "204 204" "" "281 281" "264 264"## rep 5 "" "128 128" "207 213" "122 122" "" "" "" "" "264 264"## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.998166666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "128 144" "207 213" "122 124" "132 142" "202 204" "192 204" "281 283" "264 264"## [2,] "154 156" "128 144" "207 213" "122 124" "132 142" "202 204" "192 204" "281 283" "262 264"## [3,] "154 156" "128 144" "207 213" "122 124" "132 142" "204 204" "192 204" "281 283" "264 264"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.9995825680 0.0001669728 0.0001669728

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 89: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 90: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] 176 60

Samples 214 and 215The geneticist assigned different individual identities to samples 214 and 215, an event that Genotype SPIM placed aprobability of 0.08. The Genotype SPIM placed a probability of 0.92 that these two samples came from the sameindividual. The discrepancies between these two samples can be found at loci 3 and 9. If these samples belong to thesame individual, there were 4 allelic dropout events at locus 3 for sample 214 and 4 allelic dropout events for sample 215at locus 9. Because allelic dropout events were not rare, the Genotype SPIM places a much higher probability that thesesamples belong to the same individual than different individuals, but this later event is not conclusively ruled out. Thesexes for both of these samples were female, further supporting the conclusion of the Genotype SPIM.

Interestingly, locus 9 for sample 214 was assigned the consensus 264.266 by the geneticist despite the scoring rulessuggesting that it should have been called 264.264 because the heterozygote was never seen and 264.264 was seen 4times. We are unaware of a scoring rule for calling a heterozygote when each corresponding homozygote is seen a certainnumber of times. No rule like this is mentioned in the original study. In this case, the presumed deviation from the typicalscoring rules resulting in the most probable score being assigned, 264.266, and the typical scoring rules would haverequired assigning 264.264, which is very improbable.

## [1] "Here is the data for the focal sample (214):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "128 144" "213 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## rep 2 "158 158" "128 144" "213 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## rep 3 "158 158" "128 144" "213 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## rep 4 "158 158" "128 144" "213 213" "122 122" "132 142" "204 204" "192 192" "283 283" "266 266"## rep 5 "" "128 144" "" "" "" "" "192 192" "" "266 266"## rep 6 "" "128 144" "" "" "" "" "200 204" "" "264 264"## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.0771666666666667"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "128 144" "213 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 266"## [2,] "158 158" "128 144" "213 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 91: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.7984995 0.2015005 NA## ## [1] "Here is the data for candidate sample 1 (215):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "128 144" "211 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## rep 2 "158 158" "128 144" "211 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## rep 3 "158 158" "128 144" "211 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## rep 4 "158 158" "128 144" "213 213" "122 122" "142 142" "204 204" "" "281 283" "264 264"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.92225"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "128 144" "211 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 264"## [2,] "158 158" "128 144" "211 213" "122 122" "132 142" "204 204" "200 204" "281 283" "264 266"## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 9.999096e-01 9.035872e-05 NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 92: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 93: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] 103 104

High probability matches from samples discardedby geneticist to samples originally usedHere, we will look at some high probability assignments from the samples originally discarded by the geneticist because acertain identity assignment could not be made.

Sample 51Sample 51 was originally discarded because it only amplified at 5/9 loci; however, the Genotype SPIM assigns it to thesame individual as samples 42, 43, and 80 with probability 1 - a certain spatial recapture. For this to be the true state,there was most likely 2 allelic dropouts at locus 1 and 3 allelic dropouts at locus 6 for sample 51; however, allelic dropoutswere estimated to be very likely for poor quality samples like sample 51. Thus, a partial genotype with only 5 amplified loci,2 of which were scored incorrectly contains enough information to assign it an individual identity with certainty. Notesamples 45 and 50, amplified at 0 and 1 locus, respectively, match the focal sample of 51 with probability 0.20 and 0.48,demonstrating that the spatial information alone carries substantial information about individual identity, at least in this lowdensity population.

## [1] "Here is the data for the focal sample (51):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "156 156" "144 144" "207 211" "" "142 142" "202 202" "" "" "" ## rep 2 "156 156" "144 144" "207 211" "" "142 142" "202 202" "" "" "" ## rep 3 "" "144 144" "207 207" "" "142 142" "202 202" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.00233333333333333"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 202" "192 200" "279 281" "262 264"## [2,] "154 156" "144 144" "207 211" "124 124" "142 142" "202 202" "200 200" "291 291" "264 264"## [3,] "156 156" "136 144" "207 211" "124 124" "142 142" "202 202" "204 204" "279 281" "264 264"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.03571429 0.03571429 0.03571429

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 94: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "Here is the data for candidate sample 1 (42):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 156" "144 144" "211 211" "122 124" "142 142" "202 204" "192 204" "283 283" "264 264"## rep 2 "154 154" "144 144" "211 211" "122 124" "142 142" "202 204" "192 204" "283 283" "264 264"## rep 3 "156 156" "144 144" "211 211" "122 122" "142 142" "202 204" "192 204" "283 283" "264 264"## rep 4 "" "144 144" "" "122 122" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.996"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 2 (43):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "156 156" "144 144" "207 207" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## rep 2 "154 156" "144 144" "211 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## rep 3 "156 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "283 283" "264 264"## rep 4 "154 154" "" "207 207" "122 122" "" "" "" "283 283" "" ## rep 5 "" "144 144" "" "" "" "" "192 192" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 95: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The focal sample matched this candidate with probability 0.996"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 3 (45):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "" "" "" "" "" "" "" "" ## rep 2 "" "" "" "" "" "" "" "" "" ## rep 3 "" "" "" "" "" "" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.204666666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 4 (50):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "144 144" "" "" "" "" "" "" "" ## rep 2 "" "144 144" "" "" "" "" "" "" "" ## rep 3 "" "" "" "" "" "" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 96: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.480166666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## [2,] "154 156" "144 144" "207 211" "122 122" "142 142" "202 202" "200 200" "281 281" "266 266"## [3,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 202" "192 200" "279 281" "258 258"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.9965289830 0.0001735509 0.0001735509## [1] "Here is the data for candidate sample 5 (80):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## rep 2 "156 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## rep 3 "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## rep 4 "156 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.996"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "144 144" "207 211" "122 124" "142 142" "202 204" "192 204" "281 283" "264 264"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ##

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 97: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 98: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 99: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 100: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 17 17 NA NA 17

Sample 67Sample 67 was originally discarded because it only amplified at 3/9 loci and it only amplified in 2 replications at each ofthese 3 loci. The Genotype SPIM assigns it to the same individual as sample 64, captured in the same trap, withprobability 0.96. For this to be a correct match, it implies that both replicates for sample 67 at loci 2 and 6 were allelicdropouts and the only correct scores for sample 67 were the two at locus 7. The high probability of this match likely stemsfrom the fact that the observed genotypes at locus 2 and 6 were estimated to be very rare, especially the genotype of142.142 at locus 2, estimated to occur with probability 0.0002, while 142.144 was estimated to occur with probability0.0315. The genotype frequency estimates can be found in the last section of this document.

## [1] "Here is the data for the focal sample (67):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "142 142" "" "" "" "202 202" "204 204" "" "" ## rep 2 "" "142 142" "" "" "" "202 202" "204 204" "" "" ## rep 3 "" "" "" "" "" "" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 101: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.0075"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 142" "207 213" "126 126" "142 144" "202 204" "200 204" "281 283" "258 264"## [2,] "158 158" "136 142" "213 213" "122 122" "132 132" "202 202" "192 204" "281 281" "266 266"## [3,] "154 158" "136 142" "211 213" "122 124" "142 144" "202 204" "200 204" "281 287" "266 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.015037594 0.013157895 0.009398496## ## [1] "Here is the data for candidate sample 1 (64):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "156 158" "144 144" "213 217" "122 126" "132 142" "202 208" "204 204" "281 283" "264 266"## rep 2 "156 158" "142 144" "207 213" "122 126" "132 142" "202 208" "204 204" "281 283" "264 266"## rep 3 "156 158" "144 144" "207 213" "122 126" "132 142" "202 208" "204 204" "281 283" "264 266"## rep 4 "156 158" "142 144" "213 217" "122 126" "132 142" "202 208" "204 204" "281 283" "264 266"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.955666666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "156 158" "142 144" "207 213" "122 126" "132 142" "202 208" "204 204" "281 283" "264 266"## [2,] "156 158" "142 144" "213 217" "122 126" "132 142" "202 208" "204 204" "281 283" "264 266"## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"##

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 102: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [1] 0.98282176 0.01717824 NA.CC-BY-NC-ND 4.0 International licenseavailable under a

(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 103: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 181

Sample 141Sample 141 was originally discarded because it only amplified at 5/9 loci and it only amplified in 2 replications at 3 of theseloci. The Genotype SPIM assigns it to the same individual as sample 138, captured in the same trap, with probability 0.99.For this to be a correct match, there were 2 allelic dropouts for sample 141 at locus 8. Given that sample 141 is “lowquality”, this is not an unlikely event.

## [1] "Here is the data for the focal sample (141):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "158 158" "" "213 213" "122 122" "" "" "196 204" "281 281" "" ## rep 2 "158 158" "" "213 213" "122 122" "" "" "196 196" "281 281" "" ## rep 3 "" "" "" "122 122" "" "" "196 196" "" "" ## rep 4 "" "" "" "" "" "" "204 204" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.00483333333333333"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 104: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "140 140" "213 213" "122 122" "132 142" "204 204" "196 204" "281 281" "264 264"## [2,] "158 158" "140 140" "213 213" "122 122" "132 132" "204 204" "196 204" "281 281" "264 264"## [3,] "158 158" "138 140" "213 213" "122 122" "132 132" "204 204" "196 204" "281 281" "258 268"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.02380952 0.01785714 0.01190476## ## [1] "Here is the data for candidate sample 1 (138):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "128 136" "213 213" "122 122" "132 144" "204 204" "196 204" "281 283" "266 266"## rep 2 "158 158" "128 136" "213 213" "122 122" "132 144" "204 204" "196 204" "281 283" "266 266"## rep 3 "158 158" "128 136" "213 213" "122 122" "132 144" "204 204" "196 204" "281 283" "266 266"## rep 4 "158 158" "128 136" "" "122 122" "132 144" "204 204" "196 204" "" "266 266"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.986"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "128 136" "213 213" "122 122" "132 144" "204 204" "196 204" "281 283" "266 266"## [2,] "158 158" "128 136" "207 213" "122 122" "132 144" "204 204" "196 204" "281 283" "266 266"## [3,] "158 158" "128 136" "211 213" "122 122" "132 144" "204 204" "196 204" "281 283" "266 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 9.997465e-01 8.451657e-05 8.451657e-05

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 105: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 106: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 77

Sample 144Sample 144 was originally discarded because it only amplified at 2/9 loci and it only amplified in 2 replications at each ofthese loci. The Genotype SPIM assigns it to the same individual as sample 143, captured in the same trap, with probability0.98. No genotyping errors occurred for sample 144 if this is a correct match.

## [1] "Here is the data for the focal sample (144):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "136 136" "" "" "132 132" "" "" "" "" ## rep 2 "" "136 136" "" "" "132 132" "" "" "" "" ## rep 3 "" "" "" "" "" "" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.02025"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 136" "207 213" "122 122" "132 132" "204 204" "200 200" "283 283" "262 266"## [2,] "154 158" "136 136" "207 213" "122 124" "132 132" "204 204" "204 204" "281 283" "264 264"## [3,] "154 158" "136 138" "207 211" "124 124" "132 132" "202 204" "200 204" "279 285" "262 262"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.007380074 0.007380074 0.007380074## ## [1] "Here is the data for candidate sample 1 (143):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "136 136" "207 207" "126 130" "132 132" "204 204" "192 192" "281 281" "264 264"## rep 2 "158 162" "136 136" "207 207" "126 126" "132 132" "204 204" "192 196" "281 281" "264 264"## rep 3 "162 162" "136 136" "207 207" "126 130" "132 132" "204 204" "192 196" "281 281" "264 264"## rep 4 "158 162" "136 136" "" "126 126" "132 132" "204 204" "192 196" "281 281" "264 264"## rep 5 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 107: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.977416666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 162" "136 136" "207 207" "126 130" "132 132" "204 204" "192 196" "281 281" "264 264"## [2,] "158 162" "136 136" "207 211" "126 130" "132 132" "204 204" "192 196" "281 281" "264 264"## [3,] "158 162" "136 136" "207 213" "126 130" "132 132" "204 204" "192 196" "281 281" "264 264"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.996674908 0.001364140 0.001278881

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 108: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 79

Sample 191Sample 191 was originally discarded because it only amplified at 5/9 loci in 3 or more replications. Genotype SPIMmatches this sample with 190 with probability 1. For this to be a correct match, there were 2 allelic dropouts for sample191 at locus 1 and 2 allelic dropouts for sample 191 at locus 7.

## [1] "Here is the data for the focal sample (191):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "162 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 192" "" "258 258"## rep 2 "162 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 192" "" "258 258"## rep 3 "" "136 136" "213 213" "122 122" "132 132" "204 204" "" "" "" ## rep 4 "" "136 136" "" "" "132 132" "" "" "" "" ## rep 5 "" "136 136" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 109: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample always matched with at least one other sample"## ## [1] "Here is the data for candidate sample 1 (190):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 200" "281 283" "258 258"## rep 2 "154 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 200" "281 283" "258 258"## rep 3 "154 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 200" "281 283" "258 258"## rep 4 "154 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 200" "281 283" "258 258"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 1"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 200" "281 283" "258 258"## [2,] "154 162" "136 136" "213 213" "122 122" "132 142" "204 204" "192 200" "281 283" "258 268"## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 9.999167e-01 8.333333e-05 NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 110: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 111: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 94

Sample 300Sample 300 was originally discarded because it only amplified at 3/9 loci and only amplified 3 or more times at 1 locus.The Genotype SPIM matches this sample to samples 297, 298, and 299, all captured at the same trap, with probability0.97. If this is the correct match, there were 2 allelic dropouts for sample 300 at locus 3.

## [1] "Here is the data for the focal sample (300):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "" "213 213" "" "" "204 204" "192 192" "" "" ## rep 2 "" "" "213 213" "" "" "204 204" "192 192" "" "" ## rep 3 "" "" "" "" "" "204 204" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.033"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 156" "128 136" "213 213" "122 124" "132 134" "204 204" "192 200" "283 283" "262 264"## [2,] "154 158" "140 144" "213 213" "122 128" "134 144" "204 204" "192 192" "281 281" "258 258"## [3,] "154 162" "140 144" "213 213" "122 124" "132 132" "204 204" "192 192" "281 283" "262 264"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.005050505 0.005050505 0.005050505## ## [1] "Here is the data for candidate sample 1 (297):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 2 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 3 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 4 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 5 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 112: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.967"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 2 (298):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 2 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 3 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 4 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "" ## rep 5 "" "136 136" "" "" "" "" "192 192" "" "262 262"## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.967"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 113: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 3 (299):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "136 136" "213 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 2 "158 158" "136 136" "211 211" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 3 "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 4 "" "136 136" "213 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.967"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "136 136" "211 213" "122 124" "132 144" "204 204" "192 192" "283 283" "262 262"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 114: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 115: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 116: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 9 9 9

Sample 304Sample 304 was originally discarded because it only amplified at 4/9 loci and only amplified 3 or more times at 2 loci. TheGenotype SPIM matches this sample to sample 303, captured at the same trap, with probability 1. If this is the correctmatch, there were 3 allelic dropouts for sample 304 at locus 2. Further, there was 1 allelic dropout and 2 false allele eventsat locus 9 for sample 304.

## [1] "Here is the data for the focal sample (304):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "" "136 136" "" "" "142 142" "" "" "281 281" "262 262"## rep 2 "" "" "" "" "142 142" "" "" "281 281" "264 266"## rep 3 "" "136 136" "" "" "142 142" "" "" "" "262 266"## rep 4 "" "136 136" "" "" "" "" "" "" "264 264"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.00408333333333333"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "156 156" "136 136" "207 213" "122 128" "142 142" "204 204" "192 196" "281 281" "262 266"## [2,] "158 158" "136 136" "207 207" "122 122" "142 142" "204 204" "192 204" "281 281" "262 264"## [3,] "152 154" "136 136" "213 217" "122 122" "142 142" "204 208" "204 204" "281 281" "262 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.04081633 0.04081633 0.02040816## ## [1] "Here is the data for candidate sample 1 (303):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "158 158" "136 136" "213 213" "124 124" "142 142" "202 204" "204 204" "281 281" "262 266"## rep 2 "158 158" "136 136" "213 213" "124 124" "142 142" "202 204" "204 204" "281 281" "262 266"## rep 3 "158 158" "136 136" "213 213" "124 124" "142 142" "202 204" "204 204" "281 281" "262 266"## rep 4 "158 158" "128 128" "213 213" "124 124" "142 142" "204 204" "" "281 281" "262 266"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 117: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 5 "158 158" "128 128" "" "" "" "202 204" "204 204" "" "" ## rep 6 "" "136 136" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.995916666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "136 136" "213 213" "124 124" "142 142" "202 204" "204 204" "281 281" "262 266"## [2,] "158 158" "128 136" "213 213" "124 124" "142 142" "202 204" "204 204" "281 281" "262 266"## [3,] "158 158" "136 136" "207 213" "124 124" "142 142" "202 204" "204 204" "281 281" "262 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.856330014 0.143251611 0.000251025

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 118: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 129

Sample 416Sample 416 was originally discarded because it only amplified at 5/9 loci and only amplified 3 or more times at 1 loci. TheGenotype SPIM matches this sample to samples 418 and 419, captured at a nearby trap, with probability 0.99. If this is thecorrect match, there were 2 allelic dropouts for sample 416 at locus 6, and 2 allelic dropouts for sample 416 at locus 9.

## [1] "Here is the data for the focal sample (416):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "" "" "" "124 124" "142 142" "" "204 204" "" "268 268"## rep 2 "" "" "" "124 124" "142 142" "208 208" "204 204" "" "268 268"## rep 3 "" "" "" "" "" "208 208" "204 204" "" "" ## rep 4 "" "" "" "" "" "" "204 204" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.00808333333333333"##

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 119: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "128 144" "207 213" "124 124" "142 142" "208 208" "204 204" "281 281" "262 268"## [2,] "154 158" "136 142" "213 213" "124 124" "142 142" "208 208" "204 204" "281 281" "268 268"## [3,] "156 156" "138 144" "207 207" "124 124" "142 142" "208 208" "204 204" "281 281" "268 268"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.02 0.02 0.02## ## [1] "Here is the data for candidate sample 1 (418):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9 ## rep 1 "154 158" "136 144" "207 207" "124 124" "142 142" "204 208" "204 204" "279 279" "268 268"## rep 2 "154 158" "136 144" "207 207" "124 124" "142 142" "204 204" "204 204" "279 283" "266 268"## rep 3 "154 158" "136 144" "207 207" "124 124" "142 142" "208 208" "204 204" "279 283" "266 268"## rep 4 "154 158" "136 144" "207 207" "124 124" "142 142" "204 204" "" "283 283" "266 268"## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.988916666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 144" "207 207" "124 124" "142 142" "204 208" "204 204" "279 283" "266 268"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 2 (419):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 120: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

9 ## rep 1 "154 158" "136 144" "207 207" "124 124" "142 142" "204 208" "204 204" "" "266 266"## rep 2 "154 158" "136 144" "207 207" "124 124" "142 142" "204 208" "204 204" "279 279" "266 268"## rep 3 "154 158" "136 136" "207 207" "124 124" "142 142" "208 208" "204 204" "279 279" "268 268"## rep 4 "158 158" "144 144" "" "124 124" "" "208 208" "204 204" "279 279" "" ## rep 5 "" "" "" "" "142 142" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.988916666666667"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 144" "207 207" "124 124" "142 142" "204 208" "204 204" "279 283" "266 268"## [2,] NA NA NA NA NA NA NA NA NA ## [3,] NA NA NA NA NA NA NA NA NA ## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 1 NA NA## [1] "Here is the data for candidate sample 3 (420):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "158 158" "" "" "" "142 142" "" "204 204" "" "" ## rep 2 "158 158" "" "" "" "142 142" "" "204 204" "" "" ## rep 3 "" "" "" "" "142 142" "" "" "" "" ## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "The focal sample matched this candidate with probability 0.885"## ## [1] "Conditional on the focal sample matching this candidate, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "154 158" "136 144" "207 207" "124 124" "142 142" "204 208" "204 204" "279 283" "266 268"## [2,] "154 158" "128 144" "207 213" "124 124" "142 142" "208 208" "204 204" "283 283" "268 2

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 121: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

68"## [3,] "158 158" "128 128" "211 213" "124 124" "142 142" "208 208" "204 204" "279 287" "268 268"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 9.968927e-01 9.416196e-05 9.416196e-05

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 122: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 123: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA 51 51 NA

High probability assignments that discardedsamples are unique individuals with no matches tooriginally used samplesSample 6Sample 6 was originally discarded because it did not amplify at any loci. The Genotype SPIM gives a 0.95 probability thatthis sample does not match any other samples. This high probability assignment was only possible because this sampledid not have any nearby neighboring samples to match with.

## [1] "Here is the data for the focal sample (6):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "" "" "" "" "" "" "" "" ## rep 2 "" "" "" "" "" "" "" "" "" ## rep 3 "" "" "" "" "" "" "" "" ""

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 124: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## rep 4 "" "" "" "" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.952"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "140 140" "207 207" "122 122" "132 144" "204 204" "192 204" "279 283" "262 262"## [2,] "156 158" "128 144" "211 211" "124 124" "132 132" "202 204" "200 204" "283 285" "258 258"## [3,] "152 154" "128 136" "207 211" "122 124" "132 134" "204 204" "192 204" "283 285" "262 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.0372500000 0.0103333333 0.0001666667## ## [1] "This sample did not match with any other samples with probability greater than 0.1"

## ## [1] "The geneticist-assigned IDs for these samples were:"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 125: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] NA

Sample 17Sample 17 was originally discarded because it only amplified at 2/9 loci. The Genotype SPIM gives a 0.99 probability thatthis sample does not match any other samples. The rarity of genotype 213.221 at locus 3 (0.0034) and genotype 100.128at locus 4 (0.0029) contributed to this high probability assignment of uniqueness. The genotype frequency estimates canbe found in the last section of this document.

## [1] "Here is the data for the focal sample (17):"## ## locus 1 locus 2 locus 3 locus 4 locus 5 locus 6 locus 7 locus 8 locus 9## rep 1 "" "" "213 221" "100 100" "" "" "" "" "" ## rep 2 "" "" "213 221" "100 128" "" "" "" "" "" ## rep 3 "" "" "213 221" "100 128" "" "" "" "" "" ## rep 4 "" "" "213 221" "100 100" "" "" "" "" "" ## rep 5 "" "" "" "" "" "" "" "" "" ## rep 6 "" "" "" "" "" "" "" "" "" ## rep 7 "" "" "" "" "" "" "" "" "" ## ## [1] "This sample did not match with any other samples with probability 0.9885"## ## [1] "Conditional on not matching other samples, the 3 most probable genotypes were:"## ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] ## [1,] "158 158" "136 144" "213 221" "100 128" "132 132" "204 204" "204 204" "281 281" "262 262"## [2,] "158 158" "136 144" "213 221" "100 128" "132 144" "204 204" "192 204" "279 281" "262 262"## [3,] "154 156" "144 144" "213 221" "100 128" "142 142" "204 204" "200 204" "281 281" "266 266"## ## [1] "These genotypes had posterior probabilities of:"## ## [1] 0.0003333333 0.0003333333 0.0002500000## ## [1] "This sample did not match with any other samples with probability greater than 0.1"

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 126: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## ## [1] "The geneticist-assigned IDs for these samples were:"## ## [1] NA

Genotype frequency estimatesAbove, the genotype frequency estimates were referenced. We display these here, with each list element storing thegenotype frequency estimates for each locus.

## [[1]]## 152.152 152.154 152.156 152.158 152.160 152.162 154.154 154.156 154.158 154.160 154.162 156.156 ## 0.0004 0.0092 0.0003 0.0084 0.0002 0.0001 0.0286 0.0557 0.2068 0.0002 0.0079 0.0523 ## 156.158 156.160 156.162 158.158 158.160 158.162 160.160 160.162 162.162 ## 0.0987 0.0003 0.0042 0.4226 0.0042 0.0248 0.0002 0.0002 0.0001 ## ## [[2]]## 128.128 128.136 128.138 128.140 128.142 128.144 136.136 136.138 136.140 136.142 136.144 138.138 ## 0.0727 0.1228 0.0172 0.0172 0.0002 0.0868 0.0889 0.0232 0.0191 0.0304 0.1288 0.0274 ## 138.140 138.142 138.144 140.140 140.142 140.144 142.142 142.144 144.144

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 127: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## 0.0147 0.0085 0.0308 0.0147 0.0006 0.0643 0.0002 0.0315 0.0959 ## ## [[3]]## 201.201 201.205 201.207 201.209 201.211 201.213 201.215 201.217 201.219 201.221 201.223 205.205 ## 0.0003 0.0002 0.0035 0.0002 0.0001 0.0002 0.0002 0.0001 0.0002 0.0002 0.0003 0.0004 ## 205.207 205.209 205.211 205.213 205.215 205.217 205.219 205.221 205.223 207.207 207.209 207.211 ## 0.0274 0.0001 0.0085 0.0326 0.0003 0.0001 0.0001 0.0002 0.0003 0.0908 0.0115 0.0948 ## 207.213 207.215 207.217 207.219 207.221 207.223 209.209 209.211 209.213 209.215 209.217 209.219 ## 0.1821 0.0003 0.0002 0.0002 0.0001 0.0002 0.0018 0.0002 0.0036 0.0002 0.0001 0.0003 ## 209.221 209.223 211.211 211.213 211.215 211.217 211.219 211.221 211.223 213.213 213.215 213.217 ## 0.0001 0.0002 0.0753 0.0882 0.0048 0.0001 0.0001 0.0002 0.0004 0.1241 0.0109 0.0002 ## 213.219 213.221 213.223 215.215 215.217 215.219 215.221 215.223 217.217 217.219 217.221 217.223 ## 0.0065 0.0034 0.0053 0.0001 0.0001 0.0001 0.0001 0.0002 0.0002 0.0002 0.0001 0.0006 ## 219.219 219.221 219.223 221.221 221.223 223.223 ## 0.0003 0.0002 0.0002 0.0001 0.0001 0.0001 ## ## [[4]]## 100.100 100.118 100.120 100.122 100.124 100.126 100.128 100.130 118.118 118.120 118.122 118.124 ## 0.0040 0.0001 0.0002 0.0030 0.0002 0.0040 0.0029 0.0004 0.0019 0.0002 0.0002 0.0001 ## 118.126 118.128 118.130 120.120 120.122 120.124 120.126 120.128 120.130 122.122 122.124 122.126 ## 0.0002 0.0002 0.0002 0.0001 0.0047 0.0002 0.0001 0.0002 0.0002 0.3018 0.2420 0.0479 ## 122.128 122.130 124.124 124.126 124.128 124.130 126.126 126.128 126.130 128.128 128.130 130.130 ## 0.0091 0.0164 0.1642 0.0253 0.0002 0.0110 0.0180 0.0001 0.0043 0.0004 0.0033 0.0002 ## ## [[5]]## 126.126 126.130 126.132 126.134 126.136 126.138 126.142 126.144 126.148 130.130 130.132 130.134 ## 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0038 0.0003 0.0001 0.0003 0.0039 0.0002 ## 130.136 130.138 130.142 130.144 130.148 132.132 132.134 132.136 132.138 132.142 132.144 132.148 ## 0.0002 0.0003 0.0002 0.0006 0.0002 0.2013 0.0488 0.0064 0.0004 0.1419 0.1097 0.0044 ## 134.134 134.136 134.138 134.142 134.144 134.148 136.136 136.138 136.142 136.144 136.148 138.138 ## 0.0313 0.0032 0.0001 0.0194 0.0188 0.0002 0.0004 0.0001 0.0030 0.0001 0.0003 0.0043 ## 138.142 138.144 138.148 142.142 142.144 142.148 144.144 144.148 148.148

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint

Page 128: Spatial Proximity Moderates Genotype Uncertainty in Genetic Tagging … · 2020. 1. 1. · Genetic Tagging Studies Ben C. Augustinea J. Andrew Royleb Daniel W. Linden c Angela K

## 0.0003 0.0004 0.0002 0.1084 0.0751 0.0005 0.0614 0.0001 0.0002 ## ## [[6]]## 196.196 196.198 196.200 196.202 196.204 196.206 196.208 196.210 198.198 198.200 198.202 198.204 ## 0.0019 0.0003 0.0001 0.0002 0.0002 0.0004 0.0002 0.0003 0.0001 0.0002 0.0002 0.0041 ## 198.206 198.208 198.210 200.200 200.202 200.204 200.206 200.208 200.210 202.202 202.204 202.206 ## 0.0001 0.0001 0.0002 0.0047 0.0093 0.0003 0.0001 0.0002 0.0002 0.0093 0.1197 0.0002 ## 202.208 202.210 204.204 204.206 204.208 204.210 206.206 206.208 206.210 208.208 208.210 210.210 ## 0.0165 0.0030 0.5552 0.0001 0.1094 0.0035 0.0014 0.0002 0.0001 0.0182 0.0002 0.0002 ## ## [[7]]## 188.188 188.192 188.196 188.200 188.204 188.208 192.192 192.196 192.200 192.204 192.208 196.196 ## 0.0030 0.0004 0.0002 0.0004 0.0003 0.0002 0.1011 0.0300 0.1135 0.1586 0.0003 0.0046 ## 196.200 196.204 196.208 200.200 200.204 200.208 204.204 204.208 208.208 ## 0.0194 0.0470 0.0002 0.0586 0.1647 0.0001 0.2160 0.0030 0.0039 ## ## [[8]]## 279.279 279.281 279.283 279.285 279.287 279.291 281.281 281.283 281.285 281.287 281.291 283.283 ## 0.0247 0.0756 0.0549 0.0100 0.0001 0.0004 0.2541 0.3004 0.0437 0.0050 0.0034 0.0734 ## 283.285 283.287 283.291 285.285 285.287 285.291 287.287 287.291 291.291 ## 0.0377 0.0002 0.0079 0.0111 0.0003 0.0002 0.0002 0.0001 0.0042 ## ## [[9]]## 258.258 258.260 258.262 258.264 258.266 258.268 260.260 260.262 260.264 260.266 260.268 262.262 ## 0.1037 0.0002 0.0193 0.0360 0.0289 0.0043 0.0034 0.0098 0.0166 0.0107 0.0003 0.1422 ## 262.264 262.266 262.268 264.264 264.266 264.268 266.266 266.268 268.268 ## 0.0626 0.0269 0.0150 0.1831 0.0740 0.0174 0.1199 0.0116 0.0106

.CC-BY-NC-ND 4.0 International licenseavailable under a(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprintthis version posted January 2, 2020. ; https://doi.org/10.1101/2020.01.01.892463doi: bioRxiv preprint