hidden diversity in the andes: comparison of species delimitation methods in montane marsupials

Accepted Manuscript

Hidden diversity in the Andes: Comparison of species delimitation methods inmontane marsupials

Thomas C. Giarla, Robert S. Voss, Sharon A. Jansa

PII: S1055-7903(13)00377-1DOI: http://dx.doi.org/10.1016/j.ympev.2013.09.019Reference: YMPEV 4720

To appear in: Molecular Phylogenetics and Evolution

Received Date: 30 April 2013Revised Date: 17 September 2013Accepted Date: 20 September 2013

Please cite this article as: Giarla, T.C., Voss, R.S., Jansa, S.A., Hidden diversity in the Andes: Comparison of speciesdelimitation methods in montane marsupials, Molecular Phylogenetics and Evolution (2013), doi: http://dx.doi.org/10.1016/j.ympev.2013.09.019

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customerswe are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, andreview of the resulting proof before it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.ympev.2013.09.019

http://dx.doi.org/http://dx.doi.org/10.1016/j.ympev.2013.09.019

http://dx.doi.org/http://dx.doi.org/10.1016/j.ympev.2013.09.019

1

Hidden diversity in the Andes: Comparison of species delimitation methods in montane 1

marsupials 2

3

Thomas C. Giarlaa, Robert S. Voss

b, and Sharon A. Jansa

a 4

5

a Department of Ecology, Evolution, and Behavior; and J.F. Bell Museum of Natural History. 6

University of Minnesota, 1987 Upper Buford Circle, St. Paul, MN, 55108, USA 7

8

b Division of Vertebrate Zoology (Mammalogy), American Museum of Natural History, Central 9

Park West at 79th Street, New York, NY, 10024, USA 10

11

Correspondent: 12

Thomas C. Giarla 13

Department of Ecology, Evolution, and Behavior 14

University of Minnesota 15

1987 Upper Buford Circle, Room 100 16

St. Paul, MN 55108 17

[email protected] 18

651-925-6383 19

20

http://ees.elsevier.com/mpe/viewRCResults.aspx?pdf=1&docID=6562&rev=1&fileID=381832&msid=C738BDC5-EADD-484F-B50F-ECFBE122F9D7

2

Abstract 21

22

Cryptic genetic diversity is a significant challenge for systematists faced with ever-23

increasing amounts of DNA sequence data. Computationally intensive coalescent-based analyses 24

involving multiple unlinked loci are the only currently viable methods by which to assess the 25

extent to which phenotypically similar populations (or metapopulations) are genetically distinct 26

lineages. Although coalescent-based approaches have been tested extensively via simulations, 27

few empirical studies have examined the impact of prior assumptions and dataset size on the 28

ability to assess genetic isolation (evolutionary independence) using molecular data alone. Here, 29

we consider the efficacy of two coalescent-based approaches (BPP and SpeDeSTEM) for testing 30

the evolutionary independence of cryptic mtDNA haplogroups within three morphologically 31

diagnosable species of Andean mouse opossums (Thylamys pallidior, T. sponsorius, and T. 32

venustus). Fourteen anonymous nuclear loci, one X-linked nuclear intron, and one mitochondrial 33

gene were analyzed for multiple individuals within each haplogroup of interest. We inferred 34

individual gene trees for each locus and considered all of the nuclear loci jointly in a species-tree 35

analysis. Using only the nuclear loci, we performed ―species validation‖ tests for the cryptic 36

mitochondrial lineages in SpeDeSTEM and BPP. For BPP, we also tested a wide range of prior 37

assumptions, assessed performance of the rjMCMC algorithm, and examined how many loci 38

were necessary to confidently delimit lineages. Results from BPP provided strong support for 39

two independent evolutionary lineages each within T. pallidior, T. sponsorius, and T. venustus, 40

whereas SpeDeSTEM results did not support splitting out mtDNA haplogroups as distinct 41

evolutionary units. For most tests, BPP was robust to prior assumptions, although priors were 42

shown to have an effect on both the strength of lineage recognition among T. venustus 43

3

haplotypes and on the efficiency of the rjMCMC algorithm. Comparisons of results from datasets 44

with different numbers of loci revealed that some cryptic lineages could be confidently delimited 45

with as few as two loci. [Keywords: Species delimitation, cryptic species, anonymous loci, 46

multilocus power analysis, Didelphidae, Thylamys] 47

4

1. Introduction 48

49

The current debate regarding species concepts (reviewed by Hausdorf 2011) shows no 50

sign of abating, but most biologists agree that speciation is an ongoing process, and that 51

contemporary populations of organisms represent evolutionary lineages at different stages of 52

distinctiveness (De Queiroz 2007). For some organisms, the delineation of independent 53

evolutionary lineages seems straightforward due to the presence of diagnostic morphological 54

characters, which can be expected to evolve either by genetic drift after a long history of 55

isolation or by divergent selection on readily observable traits (Lande, 1976). However, for many 56

organisms—perhaps especially animals that do not rely on visual mating cues (Bickford et al., 57

2007)—species recognition based solely on morphological differences can be problematic if 58

diagnostic traits are subtle, occur in commonly overlooked anatomical structures (e.g., soft 59

tissues), or simply are not present. 60

A proliferation of cryptic diversity in a given area could be caused by geographical or 61

ecological factors unique to that region. For example, mountainous areas might harbor more 62

cryptic lineages than lowland areas because rugged terrain and altitudinal zonation of habitats 63

could limit dispersal and thus create more opportunities for allopatric divergence, especially in 64

combination with dynamic climate-change regimes (Roy, 1997; Weir, 2006; Kozak and Wiens, 65

2006; Brumfield and Edwards, 2007; Ribas et al., 2007). In particular, if the ecological niches of 66

allopatric montane populations remain similar (Wiens and Graham, 2005), reliable phenotypic 67

indicators of lineage divergence may not evolve. In situations like this, morphological traits may 68

not be sufficient to detect recently diverged (or incipiently diverging) species. 69

5

Advances in DNA sequencing technologies over the past twenty years have allowed 70

systematists to uncover cryptic genetic diversity at a rapid pace, challenging all researchers who 71

study biological processes at or above the species level to contend with unexpected numbers of 72

putative lineages and candidate species (Beheregaray and Caccone, 2007; Bickford et al., 2007). 73

Mitochondrial DNA (mtDNA) markers have been especially effective for revealing cryptic 74

genetic diversity within animal systems because of the mitochondrial genome’s high mutation 75

rate and rapid coalescence time (Moore, 1995). However, the exclusive use of mtDNA in 76

phylogenetic and phylogeographic work has been criticized (Ballard and Whitlock, 2004; 77

Edwards and Bensch, 2009; Galtier et al., 2009), primarily because single-gene trees can deviate 78

from historical patterns of population branching due to the stochastic nature of coalescence 79

(gene-tree/species-tree incongruence; Degnan and Rosenberg, 2009). Nonetheless, studies that 80

include mtDNA are still common because mtDNA is typically quite variable for most animal 81

taxa, is easy to amplify and sequence, and may be a ―leading indicator‖ of isolation (Zink and 82

Barrowclough, 2008). Many studies that included both mtDNA and multiple unlinked nuclear 83

DNA (nDNA) markers have reported striking examples of cytonuclear discordance (Toews and 84

Brelsford, 2012). The discordance observed between trees inferred from mtDNA and nDNA 85

could be caused by phylogenetic error, incomplete lineage sorting, or lateral gene transfer (Funk 86

and Omland, 2003), so studies based on both types of data are necessary to disentangle the 87

evolutionary processes that resulted in observed patterns. 88

The purpose of this study is to evaluate the evolutionary independence of 89

morphologically cryptic mtDNA haplogroups within a genus of Neotropical marsupials based on 90

data from 15 nuclear loci and one mitochondrial locus. Didelphid marsupials of the genus 91

Thylamys are found in a variety of ecoregions in central and southern South America, but they 92

6

primarily inhabit arid and semiarid open habitats. Three species—T. pallidior, T. sponsorius, and 93

T. venustus—are principally montane (with some populations inhabiting areas up to 4000 meters 94

above sea level in the Andes) and occur in Peru, Bolivia, Chile, and Argentina. In a previous 95

study, we identified multiple allopatric mtDNA haplogroups within each of these species, but we 96

were unable to find morphological characters that could consistently distinguish conspecific 97

haplogroups (Giarla et al., 2010). Nevertheless, the deep mtDNA divergence we observed among 98

conspecific haplogroups (2.5% to 5.4 at the cytochrome b locus) suggests that up to seven 99

independent lineages might be present (Giarla et al., 2010). 100

A variety of methods have recently been proposed to assess the evolutionary 101

independence of putative lineages using DNA sequence data; here we focus on two of the more 102

widely used approaches. The first, Bayesian Phylogenetics and Phylogeography (BPP; Yang and 103

Rannala, 2010), provides a Bayesian approach to ―species delimitation‖ in which both 104

phylogenetic uncertainty and stochastic lineage coalescence are taken into account when testing 105

predefined splits on a single proposed species tree. BPP does this by simultaneously estimating a 106

distribution of genealogies for each locus and fitting that distribution to various permutations of 107

the species tree. The permutations allow the program to test different species tree models, from 108

the assumption of a single-taxon, panmictic population that incorporates all of the putative 109

species to the maximally resolved guide tree with each putative species as a tip. By contrast, 110

SpeDeSTEM (Ence and Carstens, 2011) is a software pipeline based on the multilocus species-111

tree method STEM (Kubatko et al., 2009), in which various permutations and combinations of 112

subpopulations within putative species are assessed for genetic independence using previously 113

estimated gene trees. SpeDeSTEM, unlike BPP, relies on a set of previously estimated gene trees 114

being in place before ―species limits‖ are assessed, so phylogenetic error is not incorporated 115

7

directly into the analysis. This approach, however, has the added benefit of being 116

computationally efficient, which becomes especially important if many sequences or loci are 117

included. 118

Tests of ―species limits‖ (in effect, genetic isolation) using both of these coalescent-based 119

approaches have the potential to validate or refute cryptic lineage diversity that only receives 120

weak support from traditional morphological approaches or that depends on single-gene 121

phylogenetic analyses. Here, we use both approaches to examine the coalescent history of 122

putative lineages within T. pallidior, T. sponsorius, and T. venustus in order to assess their 123

genetic independence. In addition to the evolutionary implications of our results, various aspects 124

of the lineage recognition process were considered and tested in this study, including: (1) Do 125

different coalescent-based ―species validation‖ approaches yield similar results based on the 126

same data? (2) How does the choice of priors affect Bayesian lineage recognition? (3) How 127

many loci are necessary to confidently recognize distinct lineages? And (4), to what extent do 128

partitions based on mtDNA haplotype membership correspond to real evolutionary units? 129

130

2. Materials and methods 131

132

2.1. Nuclear marker design 133

134

Primer pairs for nearly 40 anonymous nuclear loci were developed for testing within 135

Thylamys species. A genomic library of DNA fragments ranging in length from 500 to 1500 bp 136

was developed following a modified version of the genomic library generation protocol of Glenn 137

and Schabel (2005). First, whole genomic DNA was extracted from tissue sample NK22949 138

8

(Thylamys venustus) using a DNeasy Blood and Tissue Kit (Qiagen Inc.). Genomic DNA was 139

digested with restriction enzymes Xmn1 and Rsa and run on an agarose gel. Portions of the gel 140

corresponding to fragment lengths between 500 and 1500 bp were excised and purified using a 141

QIAquick Gel Extraction Kit (Qiagen Inc.). Double-stranded linkers were ligated to the size-142

selected fragments, and PCR was used to amplify the fragments. The resulting amplified 143

fragment library was cloned into E. coli cells using a pGEM-T Vector System (Promega Inc.). 144

After growing overnight, dozens of bacterial colonies containing inserts were picked with sterile 145

toothpicks and immediately added to a PCR reaction mixture (12.5 μL GoTaq Green Master Mix 146

[Promega Inc.], 1.0 μL of 10 μM M13F primer solution, 1.0 μL of 10 μM M13R primer solution, 147

and 10.5 μL water) for colony PCR (5 min. of initial melting at 95°; followed by 35 cycles of 148

melting at 95° for 30 sec., annealing at 55° for 30 sec., and extension at 72° for 1.5 min.; and a 149

final extension for 3 min. at 72°). PCR products were run on an agarose gel, with 700–1000 bp 150

fragments preferentially selected for further development. Selected PCR products were cleaned 151

using Exonuclease I and Shrimp Alkaline Phosphatase (Hanke and Wink, 1994) and sequenced 152

in both directions on an ABI 3730 at the University of Minnesota’s Biomedical Genomics 153

Center. 154

Sequences were assembled in Geneious version 5 (Drummond et al., 2011) and cloning 155

vector regions were trimmed. For 35 genomic regions, forward and reverse primers were 156

designed using the Primer3 (Rozen and Skaletsky, 2000) software plug-in in Geneious with the 157

goal of selecting primers that would amplify products between 500 and 800 bp long. All primers 158

were designed to be approximately 30 bp long in order to achieve a higher amplification success 159

rate and fewer instances of non-specific amplification (Belfiore, 2011). Primers were tested on 160

individuals from each of the three taxonomic species under consideration in this study: Thylamys 161

9

pallidior, T. sponsorius, and T. venustus. If amplification was successful, PCR products were 162

sequenced and aligned to determine if variation was present across the individuals tested. 163

Exemplar sequences from each of the markers developed were used as queries in a BLAST 164

search against GenBank’s nucleotide database to determine if the sequence contained protein-165

coding regions. In addition, a BLAT search (BLAST-Like Alignment Tool; Kent, 2002) was 166

used to identify the number and location of hits within the whole-genome sequence of the 167

didelphid marsupial Monodelphis domestica (Mikkelsen et al., 2007). Some BLAT searches 168

closely matched to multiple regions in the M. domestica genome, suggesting that the sequences 169

could have one or more paralogs that could ultimately be co-amplified if the marker were to be 170

developed further. BLAST searches revealed that a small number of sequences appeared to 171

contain protein-coding regions. Both protein-coding sequences and sequences that might have 172

closely related paralogs were eliminated from the pool of potential markers. Of the 35 primer 173

pairs designed, 14 sets were ultimately chosen for inclusion in this study based on product 174

length, ease of amplification, and presence of variation (Appendix A). 175

Two other markers with smaller effective population sizes (and thus, on average, shorter 176

coalescence times) than autosomal nDNA markers were included in analyses: the mitochondrial 177

protein-coding gene cytochrome b (CYTB) and an intron within the X-linked gene O-linked N-178

acetylglucosamine transferase (OGT). CYTB sequences for the individuals of interest were 179

compiled from a previous study (Giarla et al., 2010). The X-linked intron marker OGT was 180

developed by downloading a subset of single-copy gene sequences from the Ensembl Genome 181

Project (www.ensembl.org) for the Monodelphis domestica X chromosome and sorting by intron 182

size. We identified a ~600 bp intron between exons 8 and 9 of the M. domestica OGT gene and 183

10

designed primers from conserved portions of the flanking exon sequences that aligned across M. 184

domestica and Homo sapiens (Appendix A). 185

186

2.2. Sampling of loci and individuals 187

188

Fifteen nuclear markers (14 anonymous loci and the X-linked intron OGT) and one 189

mitochondrial marker (CYTB) were amplified and sequenced for multiple individuals within 190

Thylamys sponsorius, T. pallidior, and T. venustus, three species that occur in adjacent Andean 191

biomes of northern Argentina, Bolivia, northern Chile, and southern Peru (Fig. 1; Table 1). In a 192

previous analysis of mitochondrial DNA sequence variation, Giarla et al. (2010) observed 193

multiple morphologically undifferentiated mtDNA haplogroups within each of these Andean 194

species. Two haplogroups were identified within T. pallidior and T. sponsorius (designated as 195

―A‖ and ―B‖) and three were identified within T. venustus (designated as ―A‖, ―B‖, and ―C‖). In 196

all but one case (T. sponsorius A, but see Results), mitochondrial haplogroups were 197

monophyletic and well-supported by all nodal metrics (Giarla et al., 2010). Without exception, 198

all haplogroups assigned to the same species are allopatric, although the geographic ranges of 199

haplogroups T. venustus B and T. venustus C are closely juxtaposed near Vallegrande in western 200

Santa Cruz, Bolivia (Giarla et al., 2010). Limited by the availability of high-quality tissue 201

samples, between 4 and 10 individuals were chosen for inclusion in this study from within each 202

haplogroup, for a total of 60 individuals (Table 1). 203

PCR conditions were optimized for each set of primers, resulting in variation among 204

annealing temperatures and reagent concentrations. A typical PCR mixture contained 7.5 μL 205

GoTaq Green Master Mix (Promega Inc.), 0.5 μL of each 10 μM primer solution, and 5.5 μL 206

11

water. A typical thermocycler protocol consisted of 2 min. of initial melting at 95° followed by 207

35 cycles of melting at 95° for 30 sec., annealing at an optimized temperature for 30 sec., and 208

extension at 72° for 1 min., and ending with a single final extension for 7 min. at 72°. 209

Problematic amplifications were re-run using Platinum Taq DNA polymerase (Life Technologies 210

Corp.) with varying Taq concentrations. PCR products were cleaned and sequenced following 211

protocols described above. Chromatograms were compiled and edited in Sequencher 4.7 (Gene 212

Codes Inc.). All sequences used in this study are deposited in Genbank (Accession numbers 213

KF621308 - KF623029; Online Appendix B) 214

215

2.3. Haplotype phasing, sequence alignment, and neutrality tests 216

217

Heterozygous nuclear loci were phased in a three-step process. First, all length-variant 218

heterozygotes were phased using the software Champuru 1.0 (Flot, 2007). Champuru exploits the 219

haplotypic information contained in the overlapping chromatograms of length-variant 220

heterozygotes (Seroussi and Seroussi, 2007; Sorenson and DaCosta, 2011) and parses the 221

haplotypes of any heterozygous single-nucleotide polymorphisms (SNPs) present in the sequence 222

mixture. After running Champuru, remaining unresolved haplotypes were phased using the 223

software package PHASE (Stephens et al., 2001). Input files for PHASE were created using the 224

SeqPhase webserver (Flot, 2010). A ―known haplotype‖ file containing all of the haplotypes 225

resolved by Champuru was included in the PHASE analysis. PHASE was run using the default 226

settings, except the threshold for accepting a haplotype was reduced to 0.7, an appropriate value 227

based on simulation studies (Garrick et al., 2010). Finally, allele-specific sequencing primers 228

were developed for heterozygous sequences that remained unresolved after the first two steps. 229

12

PCR products from the original amplification attempts were re-sequenced with allele-specific 230

primers designed following the recommendations of Scheen et al. (2012). Successfully primed 231

haplotype sequences were added to the ―known haplotype‖ file, and PHASE was re-run with the 232

same settings as before. The four sequences that remained unresolved were removed from all 233

subsequent analyses. 234

Sequences were aligned using Clustal 2.0 (Larkin et al., 2007). Most alignments were 235

trivial and only required minor adjustments by eye. All alignments were tested for recombination 236

using the DSS method (McGuire and Wright, 2000) as implemented in the software package 237

TOPALi v2 (Milne et al., 2009). Neutrality of each locus was tested using Tajima’s D in DnaSP 238

v5 (Librado and Rozas, 2009). 239

240

2.4. Gene trees and “species” trees 241

242

Gene trees sampled across Thylamys pallidior, T. sponsorius, and T. venustus were 243

inferred from fourteen anonymous loci and one X-linked intron. For each locus, nucleotide 244

substitution models were fitted to phased sequence alignments using jModelTest 2.0 (Darriba et 245

al., 2012) and ranked according to the Bayesian Information Criterion (BIC). The best-fitting 246

model was chosen for each dataset, and phylogenetic trees were inferred in Garli 2.0 (Zwickl, 247

2006). All of the default settings in Garli were used, and Garli runs were replicated five times for 248

each locus to ensure consistency. For each locus, we report only the tree that received the highest 249

likelihood. 250

We used BEAST v. 1.7.4 (Drummond et al., 2012) to infer an ultrametric tree of 102 251

Thylamys pallidior, T. sponsorius, and T. venustus CYTB sequences used in a prior study (Giarla 252

13

et al., 2010). In that prior phylogenetic analysis of CYTB, which included all recognized 253

Thylamys species, we found that a partitioning scheme in which each codon position received its 254

own partition was the best fit (Giarla et al., 2010). Here, with a dataset that only included 255

Thylamys pallidior, T. sponsorius, and T. venustus, we used PartitionFinder (Lanfear et al., 2012) 256

to simultaneously evaluate nucleotide substitution models and codon-partitioning schemes. For 257

the BEAST analysis, we assigned the resulting best-fitting model to each partition and selected a 258

lognormal clock model, with the prior on the ucld.mean parameter set to an exponential 259

distribution with a mean equal to 1.0. Default settings were used for all other priors. BEAST was 260

run for 50 million generations, sampling every 50,000, and convergence of parameter estimates 261

was assessed using the program Tracer 1.5 (Rambaut and Drummond, 2007). 262

Before the computational tools to estimate species trees became widely available, 263

conventional multilocus phylogenetic approaches relied on concatenation of disparate genetic 264

loci into one effectively linked locus (Kubatko and Degnan, 2007). Such an approach neglects 265

the independent coalescent history of unlinked genomic regions and can lead to errors in 266

phylogenetic estimation and overconfidence in nodal support metrics (Edwards and Beerli, 2000; 267

Kubatko and Degnan, 2007). To avoid such problems, we estimated a ―species‖ (lineage) tree 268

from the 15 phased nDNA sequence datasets described above using BEAST 1.7.4 (Drummond et 269

al., 2012). Based on our results from individual-gene trees, sequences from two individuals with 270

putative hybrid ancestry were removed from the datasets used to estimate the species tree. We 271

identified these individuals as potential hybrids between T. sponsorius and T. venustus based on 272

the discordance between their positions in the mitochondrial-gene tree and their positions in the 273

nuclear-gene trees (see Results). The *BEAST algorithm (Heled and Drummond, 2010) was 274

implemented, and each sequence was assigned to a putative lineage based on the results from 275

14

analysis of CYTB alone. The same nucleotide substitution model from the Garli analyses were 276

used in BEAST. Each gene tree was tested for rate heterogeneity with likelihood-ratio tests by 277

comparing likelihood scores of trees with and without molecular clock constraints. A strict 278

molecular clock model was used for each locus that did not exhibit significant rate heterogeneity, 279

whereas a lognormal relaxed-clock model was used for each locus that did. All nucleotide 280

substitution and clock models were unlinked across loci, and all clock models were set to 281

―estimate‖ so that the resultant tree would be scaled to substitutions per site. Default settings 282

were used for all priors except for the lognormal clock model’s ucld.mean priors, which were set 283

to exponential distributions with a mean of 1.0. We ran the MCMC chain for 200 million 284

generations, sampling every 20,000th

generation, and assessed mixing in Tracer 1.5. 285

286

2.5. Bayesian “species” delimitation in BPP using nDNA 287

288

―Species‖ limits (lineage membership) based on mtDNA haplogroups were tested with 289

fifteen nuclear loci (mtDNA excluded) using BPP v 2.1 (Yang and Rannala, 2010). The BPP 290

model assumes no exchange of genes between species (Mayr, 1942), that recombination does not 291

exist within sampled loci, that all loci are independent, and that all loci are evolving neutrally. 292

BPP implements a Jukes-Cantor model of nucleotide evolution because all sampled sequences 293

are expected to be closely related. The posterior distribution of two population genetic 294

parameters are sampled: contemporary and ancestral mutation-rate-scaled effective population 295

sizes (θ) and mutation-rate-scaled species divergence times (τ). Priors for all θ parameters and 296

the τ parameter for the root of the species tree are modeled as gamma distributions (α, β), where 297

15

the prior mean = α/β and its variance = α/β2. All other τ parameters in the model were assigned a 298

Dirichlet prior. 299

In the species-delimitation model used by BPP, the variation inherent to the coalescent 300

process is explicitly incorporated, and gene trees are sampled under the constraints of a user-301

defined guide tree. When testing the limits between just two putative species, the guide tree is a 302

trivial two-taxon dichotomy. However, when testing limits between three or more species, the 303

guide tree should be justified based on morphology, geography, or a previously inferred species 304

tree (Yang and Rannala 2010). Unfortunately, mis-specified guide trees can lead to false 305

positives if divergent populations are incorrectly associated as sister taxa (Leaché and Fujita, 306

2010). Given a guide tree, a reversible-jump Markov Chain Monte Carlo (rjMCMC) algorithm 307

moves between different species delimitation models by collapsing and resolving nodes 308

throughout the tree, from a fully resolved model wherein each tip of the guide tree is considered 309

a species, to a fully collapsed model wherein all sequences are considered part of the same 310

species. The rjMCMC chain samples a posterior distribution of speciation probabilities for each 311

split in the pre-specified guide tree. Three separate datasets were initially constructed for species 312

delimitation in BPP: (1) Thylamys pallidior A and B; (2) T. sponsorius A and B; and (3) T. 313

venustus A, B, and C. Sequences were assigned to putative species based on mtDNA 314

haplogroups, and the guide tree for T. venustus was based on the topology of the *BEAST 315

species tree analysis (which also matched our previously studied CYTB topology [Giarla et al., 316

2010: fig. 8]). 317

The rjMCMC algorithm exhibits mixing problems for datasets with a large number of 318

loci and sequences (Yang and Rannala, 2010), as is the case for the three datasets considered 319

here. Poor mixing is manifested by different runs giving substantially different results, or the 320

chain getting ―stuck‖ on the fully resolved or fully collapsed guide tree. During trial runs, these 321

16

behaviors were observed for all three fully phased Thylamys datasets. Considerably better mixing 322

was observed when a smaller, pruned dataset was used. Simulation studies have shown that five 323

sequences per putative population are sufficient for species delimitation when at least 10 loci are 324

used (as in this study), but datasets with even fewer sequences are still successful as long as more 325

loci are included (Yang and Rannala, 2010; Zhang et al., 2011). Our pruned datasets varied in 326

size, but ranged from four sequences (T. pallidior A) to ten sequences (T. pallidior B). one 327

haplotype sequence from each pair of phased sequences for a given diploid individual for all 328

individuals (mostly eliminating duplicate sequences from homozygous individuals). For the X-329

linked intron OGT, males were already represented by one sequence alone, so only sequences 330

from females were pruned by half. Pilot trials using the final pruned datasets reached the same 331

results for different starting trees, suggesting that proper mixing of the rjMCMC algorithm was 332

occurring. 333

334

2.5.1. Effect of priors on species delimitation 335

336

Misspecified priors can have a strong influence on the accuracy of species delimitation 337

(Zhang et al., 2011), so different values for the gamma shape parameters α and β were tested for 338

each dataset. In total, seven different sets of priors were tested (Table 2). The first set of priors 339

considered (Scheme 1: Thylamys-specific) were derived based on Thylamys-specific estimates of 340

means for τ and θ. Jansa et al. (unpublished) inferred a time-scaled phylogeny for Didelphidae 341

based on five protein-coding nuclear genes and five external fossil calibrations; this dated 342

phylogeny included T. pallidior, T. venustus, and two other congeneric species. The average age 343

of the most recent split among these Thylamys species was estimated to be 0.8 Ma, and we 344

17

assume that all splits among the haplogroups analyzed in this report (but not analyzed by Jansa et 345

al.) would have occurred before at least 0.7 Ma. Given this assumption and a nuclear gene 346

mutation rate of ~10-9

substitutions per site (Kumar and Subramanian, 2002), a prior distribution 347

for τ could be approximated with a mean of 0.0007. DNAsp v5 (Librado and Rozas, 2009) was 348

used to identify an initial mean value for Watterson’s estimator of θ (θW; Watterson, 1975) for 349

each of the three final datasets. All three datasets had similar average θW’s across all loci (Table 350

3), and the average across the three datasets is 0.004. Diffuse gamma priors based on these 351

estimates of θ and τ were θ ~ (2, 500) and τ ~ (2, 3000). Because the gamma distribution for 352

the θ prior is based on the same data as used in the BPP runs themselves, Scheme 1 is not a 353

strictly Bayesian approach to fixing priors. To deal with potential prior misspecification, 354

independent prior schemes were considered. Similar to an approach suggested by Leaché and 355

Fujita (2010), six additional sets of priors were tested, in which three different divergence depths 356

(~0.1 Ma, 1.0 Ma, and 10.0 Ma, assuming a nuclear gene mutation rate of ~10-9

) and two 357

different effective population sizes (large, θ ~ 0.1; and small, θ ~ 0.001) were chosen (Table 2). 358

In order to ensure convergence and proper mixing of the rjMCMC algorithm, a total of 359

four BPP runs (together referred to as a ―set‖) were initialized for each of the three datasets by 360

varying the starting tree (fully resolved or collapsed) and the rjMCMC algorithm (algorithm 0 or 361

1 from Yang and Rannala, 2010). During pilot trials, the best mixing was observed for algorithm 362

0 with fine-tune parameter ε = 20 and for algorithm 1 with fine-tune parameters α = 2 and m = 363

0.5. As such, those tuning values were implemented in the corresponding final runs. All runs 364

used the cleandata=1 setting to eliminate gaps; inter-locus rate heterogeneity was modeled using 365

a Dirichlet distribution D(α), where α = 10; and the heredity scalar for the X-linked OGT gene 366

18

was set to 0.75. Each run consisted of a burn-in period of 50,000 steps and a sampling period of 367

500,000 steps (logged every 5), for a total of 100,000 samples. 368

369

2.5.2. Putative hybrids and confirmation of taxonomic species 370

371

Sequences from two individuals assigned to Thylamys venustus C (vouchered by OMNH 372

29966 and MSB 67392; Table 1) exhibited strikingly discordant signals between nuclear and 373

mitochondrial trees. For nearly all of the nuclear trees, one or both of the alleles from these 374

individuals sort with T. sponsorius sequences. Such strong and consistent cytonuclear conflict 375

suggests introgression as opposed to incomplete lineage sorting. Consistent with this 376

interpretation, both specimens are from a region of known geographic range overlap between T. 377

sponsorius and T. venustus, and one of them (OMNH 29966) was previously recognized as a 378

phenotypic intermediate based on craniodental measurements (Giarla et al., 2010: fig. 18). 379

Because these putative hybrids likely represent gene flow between sympatric T. sponsorius and 380

T. venustus, and not gene flow between allopatric haplogroups within T. venustus or T. 381

sponsorius, these individuals were removed from all of the species delimitation analyses that 382

concerned differentiation of haplogroups. To assess the taxonomic distinctiveness of T. 383

sponsorius and T. venustus, a final set of BPP runs was completed on a dataset that grouped 384

together haplogroups within these morphologically diagnosable species and included sequences 385

from the putative hybrids. 386

Although the model assumes no gene flow between putative lineages, simulation studies 387

have found that low levels of gene flow do not hinder species delimitation using BPP (Zhang et 388

al., 2011; Camargo et al., 2012). A large dataset that included all individuals (including putative 389

19

hybrids) and all loci for T. sponsorius and T. venustus exhibited inconsistent results. To attain 390

better mixing, four sequences from each haplogroup were randomly chosen for inclusion in a 391

pruned dataset, along with the sequences from the two putative hybrids. Following the same 392

procedure as described above, seven different prior schemes were tested to account for different 393

depths of divergence and effective population size. In order to ensure consistent results, four 394

rjMCMC runs were initialized for each of the prior schemes following the same procedure as for 395

the haplogroup tests described in the previous section. 396

397

2.5.3. Impact of number of loci on species delimitation 398

399

In order to determine what effect the number of loci used in each BPP run might have on 400

its ability to delimit independent evolutionary units, multilocus power analyses were completed 401

(Roe et al., 2010). For each of the three taxonomic species, loci were randomly removed and 402

BPP was re-run using rjMCMC algorithm 1 (with fine tune parameters α = 2 and m = 0.5), Prior 403

Scheme 1, the same run parameters as the species delimitation tests described above, and a 404

randomly collapsed or resolved guide tree. Six dataset size classes were considered: 15 loci, 12 405

loci, 8 loci, 4 loci, 2 loci, and 1 locus. Within the 12-, 8-, 4-, and 2-locus size classes, four 406

randomly rarefied datasets were constructed as replicates. For the single-locus datasets, all 15 407

loci were considered separately as single-locus replicates. For the complete, 15-locus dataset, the 408

same dataset was used for four replicated BPP runs. The number of sequences per locus was the 409

same across all trials within a given taxonomic species. 410

411

2.5.4. Assessment of mixing 412

20

413

For some datasets it can be difficult to distinguish between a run with poor mixing that is 414

―stuck‖ on a fully resolved species model, versus a run in which the posterior probability of a 415

given split in the tree is so high that the algorithm should not be expected to visit the collapsed 416

trees at an appreciable rate. To further verify that the rjMCMC algorithm was working 417

effectively, nuclear sequences were randomly assigned to putative species within a given dataset, 418

and BPP trials were repeated with Prior Scheme 1. Under this tip-randomization scheme, it is 419

expected that a one-species model (where all of the nodes in the guide tree are collapsed) will be 420

favored over a multi-species model. If the rjMCMC algorithm were to get stuck on a fully 421

resolved guide tree despite randomly assigned sequences, the algorithm could not be expected to 422

display appropriate mixing on the non-randomized datasets. However, if the algorithm is able to 423

move between different levels of guide tree resolution and consistently results in a one-species 424

model (i.e., the expectation for a random mixture of sequences sampled from two reproductively 425

isolated species), the algorithm is behaving properly. 426

427

2.6. Species delimitation in SpeDeSTEM 428

429

SpeDeSTEM (Ence and Carstens, 2011) is a computationally efficient approach to 430

species validation based on the STEM method of species-tree estimation (Kubatko et al., 2009). 431

Sequences are first assigned to putative species, then gene trees are estimated in PAUP* 432

(Swofford, 2002) and input files for STEM are generated for each possible permutation of 433

lineage grouping. SpeDeSTEM extracts the log-likelihood scores from STEM for each 434

permutation and ranks the models according to AIC score. This approach relies on the 435

21

assumption that gene trees are estimated accurately and that the alignment for each locus does 436

not deviate from a molecular clock model. After testing clock models for BEAST (described 437

above), we removed two loci (Anon72 and Anon122) for which we were able to reject a 438

molecular clock model and prepared input files for SpeDeSTEM for the remaining 13 loci. 439

STEM requires an estimate of in order to scale the branch lengths in the species trees it 440

produces. Using DNAsp v5 (Librado and Rozas, 2009), we computed the average across the 13 441

included loci (Table 3), but we ultimately tested different values from 0.01 to 0.05 (in 442

increments of 0.01) in SpeDeSTEM due to numerical issues associated with calculating 443

likelihoods with small values. We ran SpeDeSTEM for 100 replicates, with each replicate 444

including 4 randomly subsampled alleles per putative taxon as suggested by Hird et al. (2010), 445

and tested all 20 possible permutations for haplogroup clustering within taxonomic species. 446

447

3. Results 448

449

3.1. Gene tree and species tree results 450

451

Of the 14 anonymous nDNA markers developed for this study, 11 could be mapped to 452

Monodelphis domestica autosomes (Table 3). The remaining three markers that could not be 453

mapped to the M. domestica genome also did not receive any strong hits to the non-redundant 454

―nr‖ nucleotide BLAST database, suggesting that these genomic regions do not contain protein-455

coding genes. Neutrality tests across all loci using Tajima’s D statistic revealed that none of the 456

loci exhibited significant signs of selection (Table 3), and tests for recombination using the DSS 457

method revealed that only the alignment for Anon-94 exhibited signs of a recombination 458

22

breakpoint. The shorter of the two regions on either side of the breakpoint in Anon-94 was 459

removed from the alignment and from all subsequent analyses. Measured across all sampled 460

sequences and species, θW ranges from 0.003 (the X-linked intron OGT) to 0.021 (Anon-101), 461

and the average over all of the loci is 0.009. 462

For the BEAST analysis of CYTB, the best data partitioning scheme included a separate 463

partition and substitution model for each codon position (Position 1: TrNef+I; Position 2: 464

HKY+I; Position 3: TrN). The ultrametric CYTB tree (Fig. 2) illustrates the relative divergence 465

times of each haplogroup. Thylamys sponsorius haplogroups diverged most recently, whereas the 466

deepest splits within T. venustus and T. pallidior appear to have occurred in the more distant 467

past. Each of the 15 nDNA locus alignments was analyzed separately, and the resultant 468

phylogenetic trees exhibit a wide range of relative substitution rates and topologies (Online 469

Appendix C). Notably, the haplogroup designations apparent in the CYTB tree (Fig. 2) are not 470

consistently resolved in many of the gene trees. In fact, for some of the nuclear loci, the 471

taxonomic species T. pallidior, T. sponsorius, and T. venustus do not sort into monophyletic 472

groups. Two individuals initially identified as Thylamys venustus C based on mtDNA sequences 473

and morphology (tissue nos. Arg 1108 and NK 23992 in Table 1; collecting localities marked 474

with asterisks on Fig. 1) have nuclear sequences that cluster with T. sponsorius, providing 475

evidence for limited introgression between T. sponsorius and T. venustus. Sequences from these 476

individuals were removed from BPP analyses comparing haplogroups within taxonomic species 477

but were retained in the analysis that lumped haplogroups in order to validate the species status 478

of T. venustus versus T. sponsorius. 479

Results from the species-tree analysis of nuclear data in *BEAST (Fig. 3) support the 480

same tree topology as the CYTB tree alone (Fig. 2), with posterior probabilities ≥0.95 at all 481

23

nodes. For Thylamys venustus, haplogroups B and C together formed a clade to the exclusion of 482

haplogroup A, and we used this topology for the T. venustus guide tree in BPP. 483

484

3.2. Species Delimitation using BPP with Fifteen Nuclear Loci 485

486

3.2.1. Performance of the rjMCMC algorithm 487

488

As expected, BPP runs in which sequences were randomly assigned to haplogroups 489

within a taxonomic species never recovered significant support for the defined haplogroups 490

(Table 4), indicating that the dataset size and parameters used were tractable for BPP’s rjMCMC 491

algorithm, at least for Prior Scheme 1. In total, 35 sets of analyses on the non-randomized 492

datasets were conducted (Table 4), with each set comprising 4 replicates that varied by rjMCMC 493

algorithm (0 or 1 from Yang and Rannala, 2010) and collapsed/resolved starting tree. We 494

considered a set inconsistent—and thus mixing poorly—if at least one run strongly supported a 495

different species delimitation scenario from the other runs. Of the 35 sets, three sets using Prior 496

Scheme 6 (―Deep-Large‖) exhibited inconsistent results across replicates (Table 4). The 497

remaining 32 sets converged on similar results across replicates, which indicates efficient mixing 498

of the rjMCMC chain. 499

500

3.2.2. The effect of priors on species delimitation 501

502

The effect of applying different prior schemes on species delimitation could be studied 503

for the 32 sets of analyses that exhibited proper mixing. For Thylamys pallidior and T. 504

24

sponsorius, both sets of haplogroups receive strong support as distinct species, irrespective of 505

prior scheme. Within T. venustus, the sets that exhibited proper mixing consistently reject the 506

model that splits haplogroups A, B, and C into three distinct species. For the deeper split within 507

T. venustus—between haplogroup A and the clade that unites haplogroups B+C—the choice of 508

prior dramatically affects the results. Results based on Prior Schemes 1, 2, and 3 agree that A 509

should be distinct from B+C, but results based on the remaining prior schemes support uniting 510

all of T. venustus haplogroups into one species. In order to confirm that T. sponsorius and T. 511

venustus were, in fact, genetically isolated taxa, separate analyses were conducted on a dataset 512

that pooled all of the haplogroups within each of these taxonomic species. Unlike the previous 513

tests involving T. sponsorius and T. venustus haplogroups, sequences from two individuals with 514

suspected hybrid ancestry were not removed before running BPP. Across all of the prior 515

schemes, BPP consistently resolved T. sponsorius and T. venustus as distinct species at the 516

highest level of support (Table 4). 517

518

3.2.3. Multilocus power analysis 519

520

The effects of randomly removing loci from BPP analyses varied among the three 521

taxonomic species (Fig. 4). At one extreme, the Thylamys sponsorius A and T. sponsorius B 522

were consistently differentiated with only two loci, and seven single-locus replicates received 523

posterior probabilities (PP) for species delimitation >0.95 (Fig. 4b). The distinction between T. 524

pallidior A and T. pallidior B received consistently high support (1.0 PP) with eight or more 525

loci, but two four-locus replicates recovered these groups with >0.95 PP (Fig. 4a). The 526

differentiation of T. venustus haplogroup A from B+C received consistent support (≥0.95 PP) 527

25

with eight loci (Fig. 4c), but support varied (between 0.78 and 1.0 PP) for replicates of 12 loci; 528

for 15 loci, the average support for this split was 0.99 PP. Across all trials, support for 529

differentiation between T. venustus B and T. venustus C never exceeded 0.80 PP, and the 530

maximum number of loci (15) supports differentiation between these two lineages with only 0.59 531

PP (Fig. 4d). Across all species, the single-locus replicates were especially variable, with some 532

single-locus replicates accurately delimiting species with posterior probabilities above 0.95 but 533

most coming nowhere near this threshold. 534

535

3.3. Species Limit Validation using SpeDeSTEM 536

537

When we used Watterson’s estimator of genetic diversity across all species and 538

haplogroups (0.009) as the estimate for in SpeDeSTEM, the program would not work properly, 539

presumably due to computational problems associated with calculating likelihoods when is 540

small (described in the STEM User Manual). Results from our SpeDeSTEM analysis of the 13 541

nDNA loci that fit the assumption of a molecular clock do not support our hypothesis that all of 542

the mtDNA haplogroups within Thylamys pallidior, T. sponsorius, and T. venustus should be 543

considered separate species (Table 5; only results from the analysis in which was set to 0.05 544

are shown). Instead, the model that receives the highest support (52% of the model weighting) 545

unites the haplogroups within T. pallidior and T. sponsorius, and separates out T. venustus B 546

from T. venustus A+C. The latter result contrasts with both our mtDNA topology and species 547

tree analysis, in which T. venustus A is sister to haplogroups B+C. The next two best-supported 548

models (each receiving about 20% of the overall model weighting) are similar to the top choice, 549

but either T. pallidior A and B or T. sponsorius A and B are considered distinct in the models. 550

26

The model that receives the lowest weighting is the one in which all of the haplogroups are 551

considered distinct, and the model supported by BPP is ranked 19th

of the 20 possibilities. 552

553

4. Discussion 554

555

The main goal of this study was to test the evolutionary independence (genetic isolation) 556

of mitochondrial lineages within three montane Thylamys species that were identified in a 557

previous phylogenetic study of this genus (Giarla et al., 2010). To do so, we analyzed 15 nuclear 558

DNA markers using two commonly used coalescent-based approaches: BPP (Yang and Rannala, 559

2010) and SpeDeSTEM (Ence and Carstens, 2011). This study design facilitates an empirical 560

evaluation of analytical approaches to ―species‖ delimitation, and we consider this topic first 561

before addressing the biological and evolutionary implications of our results. 562

563

4.1. Comparison of “species delimitation” methods 564

565

Results from BPP and SpeDeSTEM support different conclusions regarding the number 566

of evolutionarily independent units within the species studied. Whereas BPP supports the 567

recognition of six independent lineages (two within each of the species), SpeDeSTEM supports 568

the recognition of only four (one each for T. pallidior and T. sponsorius plus two within T. 569

venustus). Moreover, the recognition of lineages within T. venustus differs between the two 570

programs: BPP supports the basal split between the mitochondrial lineages A versus B+C, 571

whereas SpeDeSTEM supports a split between B and A+C that was not apparent in analysis of 572

the mitochondrial data or the nDNA species tree analysis. The difference between the lineage-573

27

splitting models supported by the two analyses is significant: the top four models in the 574

SpeDeSTEM analysis encompass 100% of the total model probability, but the model supported 575

by BPP is ranked 19th

out of 20 possible models. Such a large difference suggests that 576

assumptions inherent in the two different approaches to lineage recognition are strongly 577

influencing our results. 578

One significant issue with BPP (or any other Bayesian) analysis concerns the appropriate 579

specification of priors (Zhang et al., 2011). In our case, different priors affect both the 580

probability of delimiting multiple species (Table 4) and the performance of the algorithm (i.e., 581

consistency of rjMCMC results across multiple runs). One reasonable approach to prior 582

specification suggested by Leaché and Fujita (2010) is to analyze data under a range of different 583

prior schemes that encompass multiple speciation scenarios. Although this approach is far 584

superior to assuming a single set of priors, results should still be interpreted carefully. Leaché 585

and Fujita (2010) imply that setting priors for large population size and recent divergence times 586

will be the ―most difficult‖ scenario in which to test species delimitation, but this is not 587

necessarily true. Any prior that is not concordant with a strong signal from the data has the 588

potential to cause problematic results. In particular, an incorrect prior in conjunction with weak 589

signal in the data can yield results strongly biased by the prior (Zhang et al., 2011). 590

For our data, a prior scheme that coupled deep divergence times with a large population 591

size (Scheme 6) seemed especially problematic. For three of our five cases, analyses under this 592

set of priors failed to exhibit adequate mixing of the rjMCMC chain, so the results are unreliable 593

(Table 4). Setting this apparently unreasonable scenario aside, posterior probability estimates of 594

lineage independence (or lack thereof) were reassuringly consistent across a wide range of other 595

prior schemes for most of our cases. The single exception concerns whether to recognize one or 596

28

two independent lineages within T. venustus. The decision here rests on which prior scheme is 597

more reasonable for these data: one that assumes a recent (~0.1 Ma) basal divergence within the 598

taxonomic species versus ones that assume moderate (~1.0 Ma) or deep (~10.0 Ma) divergence, 599

irrespective of effective population size (Table 4). 600

In an attempt to evaluate this question, we estimated priors for our sample by using 601

existing data. To establish a prior for the splitting-time parameter , we used a fossil-calibrated 602

phylogeny of Didelphidae based on five independent nuclear protein-coding genes (Jansa et al., 603

unpublished). For θ, we used the average value of θW from the 15 nuclear loci used in this study. 604

This ―Thylamys-specific‖ prior exhibited good chain mixing in all cases, and recovered 605

consistent posterior probability values for the four cases that were not sensitive to prior 606

specification (Table 4). In the one case where prior specification affected our results (whether to 607

recognize one vs. two lineages of T. venustus), posterior probability estimates derived under this 608

prior scheme were consistent with those derived under prior schemes that assumed a recent 609

divergence time, but differed from those assuming moderate or deep divergences. In this case, 610

the population-size prior seemed to exert relatively little influence, as posterior probability 611

estimates for the two shallow-divergence schemes (large vs. small population size, schemes 2 612

and 3, respectively) were similar. Although using such empirically derived parameter estimates 613

for construction of priors might not be possible for some systems, this approach can help 614

distinguish among reasonable and unreasonable prior schemes. In our case, the deeper 615

divergence priors apparently conflict with the signal in the data, a behavior that was also 616

observed by Zhang et al. (2011) in their simulation study. 617

For SpeDeSTEM, a significant analytical issue concerns the reliability of the method for 618

inferring very recent lineage divergence. Two recent simulation studies testing the efficacy of 619

29

SpeDeSTEM have addressed this issue. Ence and Carstens (2011) found that the method was 620

effective at delimiting species that have diverged as recently as 0.25Ne generations ago, as long 621

as more than 10 loci were used. Camargo et al. (2012), however, found that SpeDeSTEM 622

performed quite poorly in scenarios in which putative species diverged less than 0.75Ne 623

generations ago, even when 10 loci were used. They note that the discrepancy between their 624

simulations and Ence and Carsten’s (2011) may be due to the relative amount of genetic 625

diversity (as approximated by ) in the gene sequences, with low-information alignments leading 626

to increased uncertainty in gene tree estimation (Camargo et al., 2012). Our estimates of and 627

(0.004 and 0.0007, respectively) suggest that all of the putative divergences considered here 628

occurred approximately 0.175Ne generations ago, putting these Thylamys species well below the 629

range in which SpeDeSTEM has been demonstrated to perform acceptably. Moreover, Camargo 630

et al. (2012) found that BPP performed very well across a range of different simulated datasets, 631

even when putative species were modeled to have diverged from one another only very recently. 632

As such, we think it is appropriate to rely on the BPP results over the SpeDeSTEM results, and 633

this study further supports the conclusions about the efficacy of SpeDeSTEM for recent 634

divergences noted by Camargo et al. (2012). It is likely that error in estimation of the gene trees 635

is high enough to obscure the signal of recent divergence and that it is necessary to incorporate 636

this error directly into species validation methods (as done in BPP) in order to recover the signal. 637

638

4.2. Lineage recognition using mitochondrial vs. nuclear loci 639

640

Our multilocus analysis clearly indicates that six of the seven mitochondrial haplotype 641

groups we tested correspond to genetically isolated lineages. The concordance between 642

30

mitochondrial and nuclear-gene analyses in these six cases provides additional evidence that the 643

inferred mitochondrial structure reflects true lineage splits rather than stochastic lineage sorting. 644

However, the two most recently formed mitochondrial haplogroups within T. venustus—even 645

though they are each strongly supported and exhibit relatively high sequence divergence—646

appear simply to reflect stochastic processes. It would be difficult to predict this latter result from 647

our analysis of the mitochondrial data alone. For example, the most recent split in our 648

mitochondrial phylogeny defines the two haplogroups within T. sponsorius (Fig. 2). If the timing 649

of this split reflects the relative recency of population splitting, then we might expect nuclear loci 650

to have had insufficient time to coalesce and define these two lineages. Perhaps surprisingly 651

then, analyses of the nuclear gene data provide very robust support for recognition of these two 652

lineages: the 15-locus BPP analysis supports two lineages of T. sponsorius across the full range 653

of prior schemes (Table 4), and our multilocus power analysis suggests that these two lineages 654

could be identified with datasets comprising as few as two nuclear loci (Figure 4). Conversely, 655

the split between the mitochondrial lineages B and C of T. venustus apparently predates the split 656

within T. sponsorius (Fig. 2), implying a higher probability that nuclear genes would have 657

coalesced for these lineages. Nevertheless, none of the BPP analyses supports this split, even 658

when 15 loci are included. Thus, although it has been argued that mitochondrial data—due to its 659

relatively small effective population size and rapid coalescence time—is a leading marker of 660

within-species genetic structure (Zink and Barrowclough, 2008), our results suggest that it is 661

impossible to distinguish population divergence from stochastic lineage sorting based solely on 662

analyses of mitochondrial loci. 663

Nuclear data are also critically important for addressing the relative timing of lineage 664

splitting events. Notably, for montane species such as these, coincident splits across multiple 665

31

species could reflect a synchronized response to a single or very closely spaced geological or 666

climatic events (Arbogast and Kenagy, 2001; Hickerson et al., 2006; McCulloch et al., 2010; 667

Bell et al., 2012). Point estimates of relative divergence times derived from our mitochondrial 668

DNA tree suggest that the basal divergence events within each of the three taxonomic species 669

were asynchronous, with simultaneous splits in T. pallidior and T. venustus preceding the split 670

within T. sponsorius by a factor of two (Fig. 2). However, the error surrounding these single-671

locus estimates is so large that the possibility of synchronous splitting cannot be rejected (Fig. 5). 672

In contrast, mean estimates of relative lineage divergence time from the 15-locus nuclear gene 673

dataset are all more recent than those derived from the single-locus mitochondrial dataset; 674

moreover, the precision of these nuclear gene estimates appears to be far higher. The error 675

estimates surrounding basal lineage splitting times derived from the 15-locus nuclear dataset are 676

at least five times smaller than the error surrounding estimates derived from the single 677

mitochondrial locus. With these reduced error estimates in hand, rather than failing to reject 678

synchronous splitting, our confidence in the conclusion that these splits were either synchronous 679

or very closely spaced in time is markedly improved. 680

681

4.3. A large number of loci might not be needed to identify cryptic lineages 682

683

At the shallow level of divergence considered here, individual gene trees based on a 684

single nuclear locus are not well resolved and reveal little about potential species limits (Online 685

Appendix C), a result that is in accordance with the expectations of coalescent theory for recently 686

diverged species (Hudson and Coyne, 2002). When considered in combination and as part of a 687

coalescent framework, nuclear genes have much greater power to detect phylogenetic units; by 688

32

integrating over phylogenetic error, the BPP algorithm leverages this power to weigh alternative 689

species delimitation scenarios (Yang and Rannala, 2010). Coalescent-based species delimitation 690

methods—including BPP and SpeDeSTEM—may offer an objective, quantitative method by 691

which to weigh alternative hypotheses of lineage independence (Fujita et al., 2012). However, 692

one potential drawback to these methods is that putative species boundaries and guide trees must 693

be defined in advance, and incorrect assignments may lead to erroneous conclusions. Given these 694

issues, it is evident that preliminary studies using mitochondrial sequences or morphology are 695

still important steps in detecting cryptic diversity. In addition, other multilocus methods for 696

species delimitation that do not rely on a priori species assignments are available to facilitate 697

lineage recognition in potentially cryptic taxa (Pons et al., 2006; O'Meara, 2010; Reid and 698

Carstens, 2012; Rittmeyer and Austin, 2012). Thus, whereas morphology might not detect 699

recently diverged lineages, and mitochondrial sequences might over-split species, multi-locus 700

nuclear data and associated analytical approaches offer a powerful way to interrogate cryptic 701

diversity. 702

With this in mind, the question of how many independent loci—and how many alleles for 703

a given locus—to sample become important analytical concerns (Pluzhnikov and Donnelly, 704

1996; Felsenstein, 2006; Carling and Brumfield, 2007; Brito and Edwards, 2008; Dupuis et al., 705

2012). With massively parallel sequencing platforms becoming more widely available and 706

cheaper to use, it is easier than ever before to generate large quantities of sequence data. A 707

fundamental question remains, however: is it really necessary to use thousands of loci to delimit 708

species when traditional approaches may be cheaper (at least for the time being) and just as 709

effective? Also, if loci are limiting, and incorporating gene tree uncertainty is critical, next-710

generation sequencing data (which might have only one or two SNPs per locus, McCormack et 711

33

al., 2013) may not be the best choice given current analytical tools. Our results indicate that BPP 712

can consistently detect lineage distinctiveness with sequences from as few as two nuclear loci. 713

However, this varied widely, and in some cases more loci were needed. Overall, this finding 714

concurs with simulation studies. Zhang et al. (2011) found that a single locus might be enough to 715

identify some recent speciation events, as long as a large number (ca. 15+) of sequences per 716

putative species are used. Camargo et al. (2012) concurred, and further found that BPP 717

performed better than other coalescent-based methods like SpeDeSTEM in situations with post-718

divergence gene flow and recent splits. SpeDeSTEM, with its reliance on point-estimates for 719

gene trees, would likely perform better with more information-rich alignments (i.e., higher 720

values) because they would garner more robustly supported gene trees. Regardless, our data 721

suggest that a moderate number (rather than thousands) of loci can provide relevant information 722

for identification of cryptic lineages, even when divergence is relatively recent. To save costs, 723

we recommend attempting species delimitation in BPP using a moderate number of loci (e.g., 724

four to eight) before investing in more intensive sequencing efforts. 725

726

4.4. Introgression detected between two species 727

728

Thylamys sponsorius and T. venustus are found in similar habitats on the eastern Andean 729

slopes and foothills of Bolivia and northern Argentina, and they are known to occur 730

sympatrically in at least one sampling area (Giarla et al., 2010). Despite deep divergence in the 731

mtDNA tree, these species are morphologically similar, with no single phenotypic character 732

consistently distinguishing them. Previously, a multivariate morphometric analysis of 733

craniodental characters demonstrated that samples of sequenced specimens could be 734

34

distinguished by linear combinations of measurement variables; these factors were then used to 735

assign taxonomic names to molecular clades by computing scores of unsequenced holotypes 736

(Giarla et al 2010). Two sequenced specimens, however, could not be convincingly assigned to 737

species based on morphometrics alone (Giarla et al., 2010), hinting at the possibility that they 738

might be hybrids. Genetic data reported here provide some support for that possibility. One of 739

these morphometric intermediates (Arg 1108) is also one of the two individuals that clusters with 740

the T. venustus C clade in the CYTB tree but with T. sponsorius in the nDNA gene trees (Online 741

Appendix C). In contrast, NK 23992 is morphometrically assigned to T. venustus with high 742

confidence, but exhibits just as much cytonuclear discordance as Arg 1108. Because Arg 1108 743

exhibits both morphometric intermediacy and strong cytonuclear discordance, we feel confident 744

in concluding that it comes from a population that has experienced introgression. Although NK 745

23992 is confidently assigned to T. venustus based on morphology alone, the strong degree of 746

discord between the CYTB results and the nDNA results also support the notion that T. 747

sponsorius and T. venustus have exchanged genes. 748

A similar pattern could be observed if both populations shared ancestral polymorphisms, 749

but this explanation seems unlikely given the high degree of concordance observed across all of 750

the independent nuclear loci we sequenced. If incomplete lineage sorting were the source of 751

cytonuclear discordance, we would expect less consistent sorting across loci. Given our results, it 752

seems likely that the mitochondrial genome of T. venustus C has introgressed through at least 753

part of the T. sponsorius population. Before further conclusions can be drawn, more focused 754

sampling at putative contact zones in northern Argentina and additional genetic studies are 755

needed. Nonetheless, this is the first reported case of introgression between species in 756

Didelphidae. 757

35

758

4.5. Conclusions: Are these cryptic lineages really species? 759

760

Although BPP and other coalescence-based approaches are often described as methods 761

for ―species delimitation‖ or ―species validation,‖ they effectively test the hypothesis that 762

populations (or metapopulations) of related organisms have ceased to exchange migrants and are 763

therefore isolated genetically (Yang and Rannala, 2010; Ence and Carstens, 2011). Genetic 764

isolation, however, is a necessary but not sufficient property for species recognition because 765

gene flow can be prevented by extrinsic factors such as geographic separation (allopatry), rather 766

than by intrinsic factors that maintain lineage separation. As Yang and Rannala (2010: 9269) 767

remarked, their method is expected to be maximally useful for identifying cryptic species that 768

occur in sympatry (where genetic isolation is most likely due to intrinsic mechanisms), and that 769

species delimitation ―should rely on many [other] kinds of data, such as morphological, 770

behavioral, and geographic evidence.‖ 771

All of the mtDNA haplotypes confirmed as genetically isolated lineages in this report are 772

allopatric, and inference about their status as full species is correspondingly problematic. 773

Because speciation is a process, the discovery of lineages in various stages of divergence is to be 774

expected. Although such discoveries pose challenges for taxonomic interpretation, they are also 775

opportunities for productive research on the sequence of evolutionary intermediates associated 776

with speciation in different plant and animal clades. 777

Names are available for all of the distinct lineages of Thylamys identified by our BPP 778

results. As we have previously discussed (Giarla et al., 2010), the name venustus is based on a 779

type specimen collected within the known geographic range of the lineage we designate as T. 780

36

venustus A, whereas the type of cinderella (currently regarded as a junior synonym of venustus), 781

was collected within the combined geographic ranges of T. venustus B and C. Although sequence 782

data from topotypic specimens would be preferable, taxonomic usage could plausibly be based 783

on these map criteria. Similarly (see Giarla et al. [2010] for type localities), the name janetta 784

could be used for T. sponsorius A, the name sponsorius could be restricted for T. sponsorius B, 785

the name pallidior could be restricted for T. pallidior A, and the name fenestrae could be used 786

for T. pallidior B. 787

For the reasons explained earlier, we do not think that a revised binomial nomenclature 788

based on such assignments is warranted until the hypothesis that these lineages are full species is 789

better corroborated (sensu De Queiroz, 2007). Logical extensions of our results that might 790

provide such corroboration could consist of ecological niche modeling to explore the possibility 791

that each lineage is associated with a distinct range of environmental conditions (as in cryptic 792

salamanders; Rissler and Apodaca, 2007), or more sophisticated phenotypic analyses that might 793

uncover subtle divergence in morphology overlooked in previous revisionary work (as by 794

Berendzen et al., 2009). The latter would be particularly welcome as a basis for more securely 795

associating type material with taxa than the geographic criteria suggested above. Lastly, fresh 796

collections along Andean transects carefully sited to intersect contact zones (if any) between 797

sister lineages could provide crucial evidence for genetic isolation in sympatry or parapatry. 798

Absent such corroboration, taxonomic restraint seems only prudent. 799

800

References 801

802

Arbogast B.S., Kenagy G.J., 2001. Comparative phylogeography as an integrative approach to 803

37

historical biogeography. J. Biogeogr. 28, 819–825. 804

Ballard J.W.O., Whitlock M.C., 2004. The incomplete natural history of mitochondria. Mol. 805

Ecol. 13, 729–744. 806

Beheregaray L.B., Caccone A., 2007. Cryptic biodiversity in a changing world. J. Biol. 6, 9. 807

Belfiore N., 2011. Developing nuclear sequences for species tree estimation in nonmodel 808

organisms: Insights from a case study of Bottae's pocket gopher, Thomomys bottae, in: 809

Knowles L.L., Kubatko L. (Eds.), Estimating Species Trees: Practical and Theoretical 810

Aspects. John Wiley & Sons, Inc., Hoboken, NJ, pp. 175–191. 811

Bell R.C., MacKenzie J.B., Hickerson M.J., Chavarría K.L., Cunningham M., Williams S.E., 812

Moritz C., 2012. Comparative multi-locus phylogeography confirms multiple vicariance 813

events in co-distributed rainforest frogs. Proc. R. Soc. Lond. B. 279, 991–999. 814

Berendzen P.B., Olson W.M., Barron S.M., 2009. The utility of molecular hypotheses for 815

uncovering morphological diversity in the Notropis rubellus species complex 816

(Cypriniformes: Cyprinidae). Copeia. 2009, 661–673. 817

Bickford D., Lohman D.J., Sodhi N.S., Ng P.K.L., Meier R., Winker K., Ingram K.K., Das I., 818

2007. Cryptic species as a window on diversity and conservation. Trends Ecol. Evol. 22, 819

148–155. 820

Brito P.H., Edwards S.V., 2008. Multilocus phylogeography and phylogenetics using sequence-821

based markers. Genetica. 135, 439–455. 822

Brumfield R.T., Edwards S.V., 2007. Evolution into and out of the Andes: a Bayesian analysis of 823

historical diversification in Thamnophilus antshrikes. Evolution. 61, 346–367. 824

Camargo A., Morando M., Avila L.J., Sites J.W. Jr, 2012. Species delimitation with ABC and 825

other coalescent-based methods: A test of accuracy with simulations and an empirical 826

38

example with lizards of the Liolaemus darwinii complex (Squamata: Liolaemidae). 827

Evolution. 66, 2834–2849. 828

Carling M.D., Brumfield R.T., 2007. Gene sampling strategies for multi-locus population 829

estimates of genetic diversity (θ). PLoS ONE. 2, e160. 830

Darriba D., Taboada G.L., Doallo R., Posada D., 2012. jModelTest 2: More models, new 831

heuristics and parallel computing. Nat. Methods. 9, 772. 832

De Queiroz K., 2007. Species concepts and species delimitation. Syst. Biol. 56, 879–886. 833

Degnan J.H., Rosenberg N.A., 2009. Gene tree discordance, phylogenetic inference and the 834

multispecies coalescent. Trends Ecol. Evol. 24, 332–340. 835

Drummond A.J., Ashton B., Buxton S., Cheung M., Cooper A., Heled J., Kearse M., Moir R., 836

Stones-Havas S., Sturrock S., Thierer T., Wilson A. Geneious v5.5.1. 2010. Available from 837

http://www.geneious.com. 838

Drummond A.J., Suchard M.A., Xie D., Rambaut A., 2012. Bayesian phylogenetics with 839

BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969–1973. 840

Dupuis J.R., Roe A.D., Sperling F.A.H., 2012. Multi-locus species delimitation in closely related 841

animals and fungi: One marker is not enough. Mol. Ecol. 21, 4422–4436. 842

Edwards S.V., Beerli P. 2000. Perspective: gene divergence, population divergence, and the 843

variance in coalescence time in phylogeographic studies. Evolution. 54:1839–1854. 844

Edwards S.V., Bensch S., 2009. Looking forwards or looking backwards in avian 845

phylogeography? A comment on Zink and Barrowclough 2008. Mol. Ecol. 18, 2930–2933. 846

Ence D.D., Carstens B.C., 2011. SpedeSTEM: A rapid and accurate method for species 847

delimitation. Mol. Ecol. Resour. 11, 473–480. 848

Felsenstein J., 2006. Accuracy of coalescent likelihood estimates: do we need more sites, more 849

39

sequences, or more loci? Mol. Biol. Evol. 23, 691–700. 850

Flot J.F., 2007. Champuru 1.0: a computer software for unraveling mixtures of two DNA 851

sequences of unequal lengths. Mol. Ecol. Notes. 7, 974–977. 852

Flot J.F., 2010. SeqPHASE: A web tool for interconverting PHASE input/output files and 853

FASTA sequence alignments. Mol. Ecol. Resour. 10, 162–166. 854

Fujita M.K., Leaché A.D., Burbrink F.T., McGuire J.A., Moritz C., 2012. Coalescent-based 855

species delimitation in an integrative taxonomy. Trends Ecol. Evol. 27, 480–488. 856

Funk D.J., Omland K.E., 2003. Species-level paraphyly and polyphyly: Frequency, causes, and 857

consequences, with insights from animal mitochondrial DNA. Annu. Rev. Ecol. Evol. Syst. 858

34, 397–423. 859

Galtier N., Nabholz B., Glemin S., Hurst G.D.D., 2009. Mitochondrial DNA as a marker of 860

molecular diversity: A reappraisal. Mol. Ecol. 18, 4541–4550. 861

Garrick R.C., Sunnucks P., Dyer R.J., 2010. Nuclear gene phylogeography using PHASE: 862

Dealing with unresolved genotypes, lost alleles, and systematic bias in parameter estimation. 863

BMC Evol. Biol. 10, 118. 864

Giarla T.C., Voss R.S., Jansa S.A., 2010. Species limits and phylogenetic relationships in the 865

didelphid marsupial genus Thylamys based on mitochondrial DNA sequences and 866

morphology. Bull. Am. Mus. Nat. Hist. 1–67. 867

Glenn T.C., Schable N.A., 2005. Isolating microsatellite DNA loci. Meth. Enzymol. 395, 202–868

222. 869

Hanke M., Wink M., 1994. Direct DNA sequencing of PCR-amplified vector inserts following 870

enzymatic degradation of primer and dNTPs. Biotech. 17, 858. 871

Hausdorf B., 2011. Progress toward a general species concept. Evolution. 65, 923–931. 872

40

Heled J., Drummond A.J., 2010. Bayesian inference of species trees from multilocus data. Mol. 873

Biol. Evol. 27, 570–580. 874

Hickerson M.J., Dolman G., Moritz C., 2006. Comparative phylogeographic summary statistics 875

for testing simultaneous vicariance. Mol. Ecol. 15, 209–223. 876

Hird S., Kubatko L., Carstens B., 2010. Rapid and accurate species tree estimation for 877

phylogeographic investigations using replicated subsampling. Mol. Phylogenet. Evol. 57, 878

888–898. 879

Hudson R.R., Coyne J.A., 2002. Mathematical consequences of the genealogical species 880

concept. Evolution. 56, 1557–1565. 881

Jansa, S.A., Barker F.K., Voss R.S. The early diversification history of didelphid marsupials: A 882

window into South America’s ―splendid isolation.‖ Submitted to Evolution. 883

Kent W.J., 2002. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664. 884

Kozak K.H., Wiens J.J., 2006. Does niche conservatism promote speciation? A case study in 885

North American salamanders. Evolution. 60, 2604–2621. 886

Kubatko L.S., Carstens B.C., Knowles L.L., 2009. STEM: species tree estimation using 887

maximum likelihood for gene trees under coalescence. Bioinformatics. 25, 971–973. 888

Kubatko L.S., Degnan J.H. 2007. Inconsistency of phylogenetic estimates from concatenated 889

data under coalescence. Syst. Biol. 56:17–24. 890

Kumar S., Subramanian S., 2002. Mutation rates in mammalian genomes. Proc. Natl. Acad. Sci. 891

USA. 99, 803–808. 892

Lande R., 1976. Natural selection and random genetic drift in phenotypic evolution. Evolution. 893

30, 314–334. 894

Lanfear R., Calcott B., Ho S.Y.W., Guindon S., 2012. Partitionfinder: combined selection of 895

41

partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 896

1695–1701. 897

Larkin M.A., Blackshields G., Brown N.P., Chenna R., McGettigan P.A., McWilliam H., 898

Valentin F., Wallace I.M., Wilm A., Lopez R., 2007. Clustal W and Clustal X version 2.0. 899

Bioinformatics. 23, 2947–2948. 900

Leaché A.D., Fujita M.K., 2010. Bayesian species delimitation in West African forest geckos 901

(Hemidactylus fasciatus). Proc. R. Soc. Lond. B. 277, 3071–3077. 902

Librado P., Rozas J., 2009. DnaSP v5: a software for comprehensive analysis of DNA 903

polymorphism data. Bioinformatics. 25, 1451–1452. 904

Mayr E. 1942. Systematics and the origin of species, from the viewpoint of a zoologist. Harvard 905

University Press, Cambridge, MA. 906

McCormack J.E., Hird S.M., Zellmer A.J., Carstens B.C., Brumfield R.T., 2013. Applications of 907

next-generation sequencing to phylogeography and phylogenetics. Mol. Phylogenet. Evol. 908

66, 526–538. 909

McCulloch G.A., Wallis G.P., Waters J.M., 2010. Onset of glaciation drove simultaneous 910

vicariant isolation of Alpine insects in New Zealand. Evolution. 64, 2033–2043. 911

McGuire G., Wright F., 2000. TOPAL 2.0: Improved detection of mosaic sequences within 912

multiple alignments. Bioinformatics. 16, 130–134. 913

Mikkelsen T.S., Wakefield M.J., Aken B., Amemiya C.T., Chang J.L., Duke S., Garber M., 914

Gentles A.J., Goodstadt L., Heger A., 2007. Genome of the marsupial Monodelphis 915

domestica reveals innovation in non-coding sequences. Nature. 447, 167–177. 916

Milne I., Lindner D., Bayer M., Husmeier D., McGuire G., Marshall D.F., Wright F., 2009. 917

TOPALi v2: A rich graphical interface for evolutionary analyses of multiple alignments on 918

42

HPC clusters and multi-core desktops. Bioinformatics. 25, 126–127. 919

Moore W.S., 1995. Inferring phylogenies from mtDNA variation: Mitochondrial-gene trees 920

versus nuclear-gene trees. Evolution. 49, 718–726. 921

O'Meara B.C., 2010. New heuristic methods for joint species delimitation and species tree 922

inference. Syst. Biol. 59, 59–73. 923

Pluzhnikov A., Donnelly P., 1996. Optimal sequencing strategies for surveying molecular 924

genetic diversity. Genetics. 144, 1247–1262. 925

Pons J., Barraclough T., Gomez-Zurita J., Cardoso A., Duran D., Hazell S., Kamoun S., Sumlin 926

W., Vogler A., 2006. Sequence-based species delimitation for the DNA taxonomy of 927

undescribed insects. Syst. Biol. 55, 595–609. 928

Rambaut A., Drummond A. 2007. Tracer v1.5. Available from: http://beast.bio.ed.ac.uk/Tracer. 929

Reid N.M., Carstens B.C., 2012. Phylogenetic estimation error can decrease the accuracy of 930

species delimitation: A Bayesian implementation of the general mixed Yule-coalescent 931

model. BMC Evol. Biol. 12, 196. 932

Ribas C.C., Moyle R.G., Miyaki C.Y., Cracraft J., 2007. The assembly of montane biotas: 933

Linking Andean tectonics and climatic oscillations to independent regimes of diversification 934

in Pionus parrots. Proc. R. Soc. Lond. B. 274, 2399–2408. 935

Rissler L.J., Apodaca J.J., 2007. Adding more ecology into species delimitation: Ecological 936

niche models and phylogeography help define cryptic species in the black salamander 937

(Aneides flavipunctatus). Syst. Biol. 56, 924–942. 938

Rittmeyer E.N., Austin C.C., 2012. The effects of sampling on delimiting species from multi-939

locus sequence data. Mol. Phylogenet. Evol. 65, 451–463. 940

Roe A.D., Rice A.V., Bromilow S.E., Cooke J.E.K., Sperling F.A.H., 2010. Multilocus species 941

43

identification and fungal DNA barcoding: Insights from blue stain fungal symbionts of the 942

mountain pine beetle. Mol. Ecol. Resour. 10, 946–959. 943

Roy M.S., 1997. Recent diversification in African greenbuls (Pycnonotidae: Andropadus) 944

supports a montane speciation model. Proc. Roy. Soc. Lond. B. 264, 1337–1344. 945

Rozen S., Skaletsky H., 2000. Primer3 on the WWW for general users and for biologist 946

programmers. Methods Mol. Biol. 132, 365–386. 947

Scheen A.-C., Pfeil B.E., Petri A., Heidari N., Nylinder S., Oxelman B., 2012. Use of allele-948

specific sequencing primers is an efficient alternative to PCR subcloning of low-copy 949

nuclear genes. Mol. Ecol. Resour. 12, 128–135. 950

Seroussi Y., Seroussi E., 2007. TraceHaplotyper: using direct sequencing to determine the phase 951

of an indel followed by biallelic SNPs. Biotech. 43, 452–456. 952

Sorenson M.D., DaCosta J.M., 2011. Genotyping HapSTR loci: Phase determination from direct 953

sequencing of PCR products. Mol. Ecol. Resour. 11, 1068–1075. 954

Stephens M., Smith N.J., Donnelly P., 2001. A new statistical method for haplotype 955

reconstruction from population data. Am. J. Hum. Genet. 68, 978–989. 956

Swofford D.L., 2002. PAUP*: Phylogenetic analysis using parsimony. Version 4.0b. Sunderland 957

(MA): Sinauer Associates. 958

Toews D.P.L., Brelsford A., 2012. The biogeography of mitochondrial and nuclear discordance 959

in animals. Mol. Ecol. 21, 3907–3930. 960

Watterson G.A., 1975. On the number of segregating sites in genetical models without 961

recombination. Theor. Popul. Biol. 7, 256–276. 962

Weir J.T., 2006. Divergent timing and patterns of species accumulation in lowland and highland 963

neotropical birds. Evolution. 60, 842–855. 964

44

Wiens J.J., Graham C.H., 2005. Niche conservatism: Integrating evolution, ecology, and 965

conservation biology. Annu. Rev. Ecol. Evol. Syst. 36, 519–539. 966

Yang Z., Rannala B., 2010. Bayesian species delimitation using multilocus sequence data. Proc. 967

Natl. Acad. Sci. USA. 107, 9264–9269. 968

Zhang C., Zhang D.X., Zhu T., Yang Z., 2011. Evaluation of a Bayesian coalescent method of 969

species delimitation. Syst. Biol. 60, 747–761. 970

Zink R.M., Barrowclough G.F., 2008. Mitochondrial DNA under siege in avian phylogeography. 971

Mol. Ecol. 17, 2107–2121. 972

Zwickl D., 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological 973

sequence datasets under the maximum likelihood criterion. [PhD dissertation]. Austin (TX): 974

University of Texas at Austin. 975

976

Acknowledgements 977

978

We thank Keith Barker, Courtney Comar, Lorissa Fujishin, and Michael Wells for 979

assistance with laboratory work. Keith Barker, Juan Diaz, and Andrew Simons provided helpful 980

comments on how to improve the manuscript. We are thankful to the many curators and 981

collection managers who loaned tissues that we used in this project, especially Michael Mares 982

and Janet Braun (Sam Noble Museum of Natural History), Joe Cook and Jon Dunnum (Museum 983

of Southwestern Biology), and Bruce Patterson and Bill Stanley (Field Museum of Natural 984

History). This work was supported by the National Science Foundation (grant numbers DEB-985

1110365 to S.A.J. and T.C.G., DEB-0743062 to S.A.J., and DEB-0743039 to R.S.V.), the 986

45

American Society of Mammalogists, the Society of Systematic Biologists, the Bell Museum of 987

Natural History, and the University of Minnesota. 988

989

Appendix A: Primers Used 990

991

992

ThyAnon61-F 5’- TGGAAGAGTTGGGGTTCAAAGTGGGT 993

ThyAnon61-R 5’- TGCTTTCCCATCCATATGCCTTTTGCC 994

ThyAnon72-F 5’- CCCAGCTAGTGAAGAGGACTGTCACC 995

ThyAnon72-R 5’- TGTGGGCTGCTGCTCTTATTGGTAGT 996

ThyAnon78-F 5’- GGGTGTGTGTTGTAATGTGTTGGACA 997

ThyAnon78-R 5’- TGCGTGAGTCTGTATGTGTCTTTATGCG 998

ThyAnon85-F 5’- ATGAGAAAGAGATGTCTTGCAAATCCTAACA 999

ThyAnon85-R 5’- AGCCTTGATTTCTTCATCAGAAAACTGTCTATT 1000

ThyAnon86-F 5’- TCTATGCACTGGGGAATATAAAGGAAGGT 1001

ThyAnon86-R 5’- TTGCTACTGCTAGTGATGCTAATACAATGTT 1002

ThyAnon89-F 5’- TGTGCAATTTTTGTGAATGAAATGTAATGCT 1003

ThyAnon89-R 5’- GGCAGGGACTGGTTTCTTTTTGCCTC 1004

ThyAnon90-F 5’- TGACAGGTGGGTCATTGAGCTATTGTT 1005

ThyAnon90-R 5’- ATGGGCAGGACTAACAAGTTGGGGAA 1006

ThyAnon94-F 5’- TCTGGGAGATGCAGTCCAAGGTTCAAA 1007

ThyAnon94-R 5’- GCCACCAACAATCACTTATGTGGCAGA 1008

ThyAnon98-F 5’- GCAGAATACATGGCACGTACCTGGG 1009

ThyAnon98-R 5’- CCTCAGACACTGTGACCCTAGGCAA 1010

ThyAnon101-F 5’- GGACCAAAGGAATGCAAGTTCAGACCC 1011

ThyAnon101-R 5’- TGGGAGCGGGAGTAAGGGAGTAAGTAA 1012

ThyAnon115-F 5’- AGTGGTATGGTTGGATCAAAGGGCA 1013

ThyAnon115-R 5’- CAGTCACTGTCGTCCACTGTCATAG 1014

ThyAnon121-F 5’- TTCACAGTGATGGTGTCAAGAGAATATCAGG 1015

ThyAnon121-R 5’- AGTTAAAATATGGCACCAGGTAAGGATCTCCA 1016

ThyAnon122-F 5’- TTGAGAAAAGGCAGGGAGGGACCAAAT 1017

ThyAnon122-R 5’- CCTAGGTTCCTCGTGGTAGACACCAAC 1018

ThyAnon128-F 5’- CTTACACCAGGCACCAACTCTGAGACA 1019

ThyAnon128-R 5’- CTCTAAACTGCCATCCCAGGGTCACTC 1020

OGT-F 5’- AAATCATTTCATCGACCTTTCTCAG 1021

OGT-R 5’- ATTCCCTGTAATGGAAAAGCAGC 1022

1023

Figure Captions 1024

1025

46

Figure 1. Map of collecting localities for specimens used in this study. Larger circles with black 1026

outlines indicate specimens for which we sequenced both CYTB and nuclear markers. Smaller 1027

circles with white outlines indicate specimens for which we only have CYTB sequences. 1028

Asterisks are placed by two collecting localities at which putative hybrid specimens were 1029

collected. Map shading reflects elevation, with dark green indicating lowlands, yellow indicating 1030

middle-elevations, and brown/white shading illustrating high/maximum elevations. 1031

1032

Figure 2. Ultrametric phylogenetic tree inferred in BEAST using the complete set of CYTB 1033

sequences from Giarla et al. (2010), illustrating relative divergence times. Tip labels correspond 1034

to tissue voucher numbers (Table 1 and Giarla et al. [2010]: Table 1). Only posterior probability 1035

values for interspecific divergences and the deepest split within each haplogroup are displayed. 1036

Percentages near some nodes reflect the uncorrected average pairwise divergence between the 1037

node’s daughter lineages. Asterisks mark two specimens that cluster with T. venustus C for 1038

CYTB but cluster with T. sponsorius for nearly all nuclear loci (Online Appendix C). 1039

1040

Figure 3. Species tree inferred in *BEAST from 14 anonymous loci and the X-linked intron 1041

OGT, with posterior probabilities at each node. Individuals were assigned to haplogroups based 1042

on the CYTB results. Two individuals of suspected hybrid origin were not included in this 1043

analysis. 1044

1045

Figure 4. Multilocus power analysis in BPP for each split of interest under Prior Scheme 1. Four 1046

randomly rarefied datasets were constructed for the 12-, 8-, 4-, and 2-locus size classes; all 15 1047

single-locus datasets were analyzed. A single complete dataset including all 15 loci together was 1048

47

independently analyzed four times. Each black circle represents the posterior probability of 1049

species delimitation based on a single random replicate dataset. Squares denote the average 1050

posterior probability value across all of the random trials for each dataset size class. 1051

1052

Figure 5. Point estimates (circles) and surrounding error estimates (lines) for the relative 1053

divergence time for the basal-most split within each species. Black dots and lines derive from the 1054

*BEAST analysis of the 15 nuclear loci; gray dots and lines are from the BEAST analysis of the 1055

cytochrome b data. In both analyses, the root node (the split between T. pallidior and T. 1056

venustus) was set at 1.0. 1057

1058

Online Appendix B. Tab-delimitted text file associating GenBank accession numbers with 1059

specimens for each loci. 1060

1061

Online Appendix C. Fifteen maximum-likelihood nuclear gene trees inferred in Garli. Colored 1062

rectangles at each tip denote the mtDNA haplogroup assigned to that individual. Stars and circles 1063

denote two individuals that cluster with T. venustus C in the CYTB tree (Fig. 2) but cluster with 1064

T. sponsorius in many of the gene trees shown here. 1065

1

TABLE 1. List of individuals sequenced and collecting localities. Haplogroup assignments are based on the results from analysis of the

mitochondrial gene cytochrome b (Fig. 2).

Tissue No. Voucher No. Species Haplogroup Country: Province/State a

NBH 76-97 FMNH 162495 pallidior A Bolivia: Tarija

NK 14721 MSB 57003 pallidior A Bolivia: Chuquisaca

NK 23533 MSB 87099 pallidior A Bolivia: Tarija

NK 96072 MSB 133108 pallidior A Chile: Tarapacá

AC 47 MLP 24.X.01.3 pallidior B Argentina: Córdoba

Arg 2690/OCGR 2153 OMNH 29957 pallidior B Argentina: Jujuy

Arg 395/OCGR 343 OMNH 23485 pallidior B Argentina: San Juan

Arg 43/OCGR 43 OMNH 23482 pallidior B Argentina: Mendoza

Arg 6392/OCGR 3957 OMNH 32559 pallidior B Argentina: Salta

Arg 6578/OCGR 7196 OMNH 34903 pallidior B Argentina: Catamarca

Arg 6683/OCGR 7279 OMNH 34908 pallidior B Argentina: Salta

Arg 687/OCGR 624 OMNH 23489 pallidior B Argentina: San Luis

LTU 77 CNP 1919 pallidior B Argentina: Buenos Aires

UP 397 CNP 1921 pallidior B Argentina: Neuquén

BDP 3309 FMNH 162507 sponsorius A Bolivia: Tarija

BDP 3345 FMNH 162505 sponsorius A Bolivia: Tarija

NK 23647 MSB 140295 sponsorius A Bolivia: Tarija




NK 23874 [uncataloged] sponsorius A Bolivia: Tarija

NK 23899 [uncataloged] sponsorius A Bolivia: Tarija



Arg 1449/OCGR 1346 OMNH 29967 sponsorius B Argentina: Tucumán

Arg 1466/OCGR 1363 OMNH 29965 sponsorius B Argentina: Catamarca

Arg 2233/OCGR 1860 IADIZA 4009 sponsorius B Argentina: Tucumán


Tables

2

Arg 2659/OCGR 2144 OMNH 29970 sponsorius B Argentina: Jujuy

Arg 4609/OCGR 3600 OMNH 29974 sponsorius B Argentina: Jujuy


Arg 6499/OCGR 3979 OMNH 32545 sponsorius B Argentina: Salta

Arg 6540/OCGR 3986 OMNH 32548 sponsorius B Argentina: Salta

NBH 20-20 AMNH 103761 venustus A Bolivia: Cochabamba

NK 21546 AMNH 263555 venustus A Bolivia: Chuquisaca

NK 21556 AMNH 263556 venustus A Bolivia: Chuquisaca

NK 25027 MSB 87109 venustus A Bolivia: Cochabamba

NK 30425 AMNH 275427 venustus A Bolivia: Cochabamba

NK 30437 MSB 87100 venustus A Bolivia: Cochabamba

NK 30479 AMNH 275429 venustus A Bolivia: Cochabamba

NK 12114 AMNH 260030 venustus B Bolivia: Santa Cruz

NK 22811 [uncataloged] venustus B Bolivia: Santa Cruz

NK 22813 MSB 87107 venustus B Bolivia: Santa Cruz


NK 22844 AMNH 275428 venustus B Bolivia: Cochabamba

NK 22845 MSB 67001 venustus B Bolivia: Cochabamba


NK 22949 [uncataloged] venustus B Bolivia: Santa Cruz

NK 22952 AMNH 275433 venustus B Bolivia: Santa Cruz


Arg 1108/OCGR 1007 b OMNH 29966 venustus C Argentina: Tucumán

Arg 2496/OCGR 2071 OMNH 29952 venustus C Argentina: Jujuy

NK 12575 AMNH 261245 venustus C Bolivia: Chuquisaca

NK 12637 AMNH 261254 venustus C Bolivia: Chuquisaca

NK 21237 MSB 63260 venustus C Bolivia: Santa Cruz

NK 21368 MSB 63262 venustus C Bolivia: Chuquisaca

NK 21516 MSB 63269 venustus C Bolivia: Chuquisaca

NK 23023 MSB 67005 venustus C Bolivia: Santa Cruz

NK 23392 MSB 67008 venustus C Bolivia: Tarija

NK 23992 b MSB 67392 venustus C Bolivia: Tarija

3

a Precise locality information can be found in Giarla et al. (2010)

b Thylamys venustus C individuals with a possible history of introgression with T. sponsorius B

4

TABLE 2. Prior schemes tested in BPP. Assuming a mutation rate of 10-9

substitutions per site per year (Kumar and Subramanian,

2002) divergence depths indicated as “Shallow” imply a prior with a mean ~0.1 Ma, “Moderate” implies a prior with a mean ~1.0 Ma,

and “Deep” implies a prior with a mean ~10.0 Ma. For effective population sizes within haplogroups, “Large” assumes θ ~ 0.1 and

small assumes θ ~ 0.001. Scheme 1 τ’s were derived from Jansa et al.’s (submitted) divergence time estimates. Priors were modeled as

diffuse gamma distributions G(α, β), where the prior mean = α/β and its variance = α/β2

Scheme Name Divergence Depth (τ) Effective Population Size (θ) Gamma Distribution for prior

Scheme 1 Estimated from other data Estimated from Anon. Locus Data θ ~ G(2, 500) and τ ~ G(2, 3000)

Scheme 2 Shallow Large θ ~ G(1, 10) and τ ~ G(2, 20000)

Scheme 3 Shallow Small θ ~ G(2, 2000) and τ ~ G(2, 20000)

Scheme 4 Moderate Large θ ~ G(1, 10) and τ ~ G(2, 2000)

Scheme 5 Moderate Small θ ~ G(2, 2000) and τ ~ G(2, 2000)

Scheme 6 Deep Large θ ~ G(1, 10) and τ ~ G(1, 10)

Scheme 7 Deep Small θ ~ G(2, 2000) and τ ~ G(1, 10)

5

TABLE 3. Characteristics of nuclear loci used in this study. Exemplar sequences from each locus were mapped to the published

genome sequence of the distantly related didelphid Monodelphis domestica using the best hit from a BLAT search (Kent, 2002).

Locus Name M. domestica Genome Locus Size (bp) Mol. Clock? a Substitution Model

b Tajima's D

b,c W

b

Anon-61 Chromosome 5 546 Yes HKY -0.1912 0.007852

Anon-72 Chromosome 8 675 No K80+G -0.0266 0.01315

Anon-78 Chromosome 2 656 Yes HKY 0.3668 0.005446

Anon-85 Chromosome 3 549 Yes HKY 0.6177 0.004229

Anon-86 Unknown 716 Yes HKY -0.8977 0.005291

Anon-89 Unknown 650 Yes TrN 1.3097 0.004102

Anon-90 Unknown 704 Yes TrN+I -0.7208 0.007745

Anon-94 Chromosome 3 752 Yes HKY+I+G 0.361 0.010313

Anon-98 Chromosome 6 523 Yes TrNef+I -0.2023 0.015538

Anon-101 Chromosome 1 457 Yes HKY+I+G -0.2983 0.021272

Anon-115 Chromosome 1 550 Yes HKY+I 0.6816 0.012835

Anon-121 Chromosome 1 543 Yes K80 1.4407 0.005932

Anon-122 Chromosome 5 723 No TPM1+I+G 0.008 0.019701

Anon-128 Chromosome 4 724 Yes TPM1+I -0.2977 0.005554

OGT X-Linked Intron 653 Yes HKY 0.222 0.003246

a “Yes” denotes that the molecular clock cannot be rejected.

b Measured or fitted across all sequences sampled within T. pallidior, T. sponsorius, and T. venustus combined.

c All values are non-significant.

6

TABLE 4: Results from species validation analysis in BPP of haplogroups within T. pallidior, T. sponsorius, and T. venustus. All

numbers indicate the posterior probability that the pair under consideration can each be considered distinct species, with bolded

numbers indicating support over 0.95 PP. The first row (“Randomized Tips”) contains the results of the BPP trials in which sequences

were randomly assigned to species in order to test performance of the rjMCMC algorithm.

Prior Scheme T. pallidior A vs.

pallidior B

T. sponsorius A vs.

sponsorius B

T. venustus A vs.

venustus B+C

T. venustus B vs.

venustus C

T. venustus vs.

sponsoriusa

Randomized Tipsb 0.02 0.0 0.14 0.03 0.0

1 (Thylamys-Specific) 1.0 1.0 0.99 0.59 1.0

2 (Shallow-Large) 1.0 1.0 0.98 0.77 1.0

3 (Shallow-Small) 1.0 1.0 1.0 0.54 1.0

4 (Moderate-Large) 1.0 1.0 0.91 0.41 1.0

5 (Moderate-Small) 1.0 1.0 0.17 0.15 1.0

6 (Deep-Large) n/ac 1.0 n/a

c n/a

c 1.0

7 (Deep-Small) 1.0 1.0 0.02 0.017 1.0

a Sequences from putative hybrid individuals included

b Sequences randomly assigned to haplotype groups within each taxonomic species; run under Prior Scheme 1 as defined in Table 2.

c Replicated runs within this set exhibited poor mixing

*

*

pallidior A

pallidior B

venustus A

venustus B

venustus C

sponsorius A

sponsorius B

Fig. 1 Color image in print

Argentina

Paraguay

Bolivia

Peru

Chile

T. pallidior A

T. pallidior B

T. sponsorius B

T. sponsorius A

T. venustus A

T. venustus C

T. venustus B

3.9%

0.02 substitutions/site

NK21515

NK14721

NK21516

Arg2690

Arg5350

NK12114

NK23647

Arg5714

NK21367

Arg6703

Arg2659

Arg4603

NK23992

NK22811

Arg1449

UMMZ155836

MVZ115634

NK21655

Arg2233Arg2460

NK21664

NK22946

NK23763

NK25027

NK23874

Arg6392

Arg4037

Arg43

NK30760

NK22986

AMNH186948

RDS18525

NK21620

Arg6540

NK23904

AMNH248705

NK12642

MVZ173937

NK30437

NK23023

Arg6499

NK12575

NK96072

Arg6821

NK21815

LTU77

NK10879

Arg4961

Arg4609

NK23903

Arg1108

NK12552

Arg522

NK23899

NK21368

NK21546

NK22952

Arg996

NBH2020

Arg281

NK21556

Arg2496

NK12638

AMNH185323

NK23730

NK23533

Arg4425

NK21237

NK22813

Arg6876

UP397

NK23392

NK21621

Arg374

MVZ143696

NK22845

Arg6683

Arg687

Arg1466

NK21622

NK23901

NK30761

NK23347

NK23762

NBH7697

BDP3309

Arg6179

FMNH29170

AMNH248704

NK22844

Arg6578

Arg395

AC47

NK30479

MVZ145531

NK30425

NK22949

NK12637

Arg6572

BDP3345

NK23719

NK22815

0.99

1.0

1.0

1.0

1.0

0.92

1.0

1.0

1.0

1.0

1.0

1.0

1.0

**

Figure 2: in color online-only

5.0%

2.5%

5.4%

1.0

1.0

1.0

1.0

0.95

0.0005 substitutions/site

T. sponsorius B

T. venustus A

T. venustus B

T. pallidior B

T. venustus C

T. sponsorius A

T. pallidior A

Figure 3

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T. venustus A vs. B+C

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T. venustus B vs. C

(a) (b)

(c) (d)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T. sponsorius A vs. B

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T. pallidior A vs. B

Number of Loci

Prob

abili

ty o

f Spe

cies

Del

imita

tion

Figure 4 Power Analysis

sponsorius venustus pallidior

00.

20.

40.

60.

81.

0

rela

tive

dive

rgen

ce ti

me

Fig. 5

Figure 5 Divergence Times

pallidior A

pallidior B

venustus A

venustus B

venustus C

sponsorius A

sponsorius B Argentina

Paraguay

Bolivia

Peru

Chile

15 nuclear loci

Not

Gen

etic

ally

Isol

ated

Compiled a dataset of 15 anonymous nuclear loci and used two species

delimitation methods. Six of seven cryptic mitochondrial lineages can be considered putative

species. As few as two nuclear loci can effectively delimit genetically isolated

populations. Mitochondrial introgression is detected between two distinct species.

hidden diversity in the andes: comparison of species delimitation methods in montane marsupials

Documents