evaluating topological conflict in centipede phylogeny ...€¦ · scutigera coleoptrata...

14
Article Evaluating Topological Conflict in Centipede Phylogeny Using Transcriptomic Data Sets Rosa Ferna ´ndez, 1 Christopher E. Laumer, 1 Varpu Vahtera, 1,2 Silvia Libro, 3 Stefan Kaluziak, 3 Prashant P. Sharma, 4 Alicia R. Pe ´rez-Porro, 1,5 Gregory D. Edgecombe, 6 and Gonzalo Giribet* ,1 1 Museum of Comparative Zoology & Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 2 Zoological Museum, Department of Biology, University of Turku, Turku, Finland 3 Marine Science Center, Northeastern University, Nahant, MA 4 Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 5 Centre d’Estudis Avanc ¸ats de Blanes (CEAB-CSIC), Catalonia, Spain 6 Department of Earth Sciences, The Natural History Museum, London, United Kingdom *Corresponding author: E-mail: [email protected]. Associate editor: Nicolas Vidal Abstract Relationships between the five extant orders of centipedes have been considered solved based on morphology. Phylogenies based on samples of up to a few dozen genes have largely been congruent with the morphological tree apart from an alternative placement of one order, the relictual Craterostigmomorpha, consisting of two species in Tasmania and New Zealand. To address this incongruence, novel transcriptomic data were generated to sample all five orders of centipedes and also used as a test case for studying gene-tree incongruence. Maximum likelihood and Bayesian mixture model analyses of a data set composed of 1,934 orthologs with 45% missing data, as well as the 389 orthologs in the least saturated, stationary quartile, retrieve strong support for a sister-group relationship between Craterostigmomorpha and all other pleurostigmophoran centipedes, of which the latter group is newly named Amalpighiata. The Amalpighiata hypothesis, which shows little gene-tree incongruence and is robust to the influence of among-taxon compositional heterogeneity, implies convergent evolution in several morphological and behavioral characters traditionally used in centipede phylogenetics, such as maternal brood care, but accords with patterns of first appearances in the fossil record. Key words: phylogenomics, incongruence, next-generation sequencing, Illumina, Myriapoda, Chilopoda, molecular dating. Introduction The myriapod Class Chilopoda—centipedes—consists of some 3,500 extant species classified in five extant orders and one extinct order (Edgecombe and Giribet 2007). They have also proven to be an interesting case study from a phy- logenetic perspective because the interrelationships of the living orders have received a high level of consensus based on diverse kinds of morphological evidence. The morpholog- ical scheme has then been used as a benchmark for assessing the efficacy of different genes, kinds of molecular data, or methods of molecular data analysis in resolving the centipede tree (e.g., Regier et al. 2005; Giribet and Edgecombe 2006; Murienne et al. 2010). Over the past few years, molecular and morphological estimates of high-level centipede phylogeny have largely reached agreement apart from the placement of one lineage, the relictual order Craterostig- momorpha. This is composed of two species in the genus Craterostigmus Pocock, 1902 (Pocock 1902) with one species endemic to each of Tasmania and New Zealand (Edgecombe and Giribet 2008). The morphological tree of centipedes reflects the groups established in classifications devised more than a century ago (Pocock 1902; Verhoeff 1902–1925)(fig. 1A). This hypothesis recognizes a fundamental division of Chilopoda into Notostigmophora (composed of the single order Scutigero- morpha, with ~100 species) and Pleurostigmophora, which groups the other four living orders. The names of the two groups reflect a difference in the position of the respiratory openings, either opening dorsally on the tergal plates (in Notostigmophora) or opening above the leg bases in the pleuron (in Pleurostigmophora). Pleurostigmophora in turn divides into Lithobiomorpha and a putative clade that groups the remaining three orders, named Phylactometria based on a shared behavior of maternal care of the eggs and hatchlings (Edgecombe and Giribet 2004). Phylactometria consists of Craterostigmus, Scolopendromorpha, and Geophilomorpha, with the latter two orders almost universally united in a group named Epimorpha based on their strictly epimorphic mode of development, that is, having a fixed number of segments throughout the course of postembryonic development. These relationships were initially defended using nonquan- titative methods applied to either diverse kinds of anatomical and behavioral data (Dohle 1985; Borucki 1996) or detailed studies of particular organ systems (Hilken 1997; Wirkner and ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 1500 Mol. Biol. Evol. 31(6):1500–1513 doi:10.1093/molbev/msu108 Advance Access publication March 26, 2014 at Harvard Library on June 12, 2014 http://mbe.oxfordjournals.org/ Downloaded from

Upload: others

Post on 27-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

Article

Evaluating Topological Conflict in Centipede Phylogeny UsingTranscriptomic Data SetsRosa Fernandez1 Christopher E Laumer1 Varpu Vahtera12 Silvia Libro3 Stefan Kaluziak3

Prashant P Sharma4 Alicia R Perez-Porro15 Gregory D Edgecombe6 and Gonzalo Giribet1

1Museum of Comparative Zoology amp Department of Organismic and Evolutionary Biology Harvard University Cambridge MA2Zoological Museum Department of Biology University of Turku Turku Finland3Marine Science Center Northeastern University Nahant MA4Division of Invertebrate Zoology American Museum of Natural History New York NY5Centre drsquoEstudis Avancats de Blanes (CEAB-CSIC) Catalonia Spain6Department of Earth Sciences The Natural History Museum London United Kingdom

Corresponding author E-mail ggiribetgharvardedu

Associate editor Nicolas Vidal

Abstract

Relationships between the five extant orders of centipedes have been considered solved based on morphologyPhylogenies based on samples of up to a few dozen genes have largely been congruent with the morphological treeapart from an alternative placement of one order the relictual Craterostigmomorpha consisting of two species inTasmania and New Zealand To address this incongruence novel transcriptomic data were generated to sample allfive orders of centipedes and also used as a test case for studying gene-tree incongruence Maximum likelihood andBayesian mixture model analyses of a data set composed of 1934 orthologs with 45 missing data as well as the 389orthologs in the least saturated stationary quartile retrieve strong support for a sister-group relationship betweenCraterostigmomorpha and all other pleurostigmophoran centipedes of which the latter group is newly namedAmalpighiata The Amalpighiata hypothesis which shows little gene-tree incongruence and is robust to the influenceof among-taxon compositional heterogeneity implies convergent evolution in several morphological and behavioralcharacters traditionally used in centipede phylogenetics such as maternal brood care but accords with patterns of firstappearances in the fossil record

Key words phylogenomics incongruence next-generation sequencing Illumina Myriapoda Chilopoda molecular dating

IntroductionThe myriapod Class Chilopodamdashcentipedesmdashconsists ofsome 3500 extant species classified in five extant ordersand one extinct order (Edgecombe and Giribet 2007) Theyhave also proven to be an interesting case study from a phy-logenetic perspective because the interrelationships of theliving orders have received a high level of consensus basedon diverse kinds of morphological evidence The morpholog-ical scheme has then been used as a benchmark for assessingthe efficacy of different genes kinds of molecular data ormethods of molecular data analysis in resolving the centipedetree (eg Regier et al 2005 Giribet and Edgecombe 2006Murienne et al 2010) Over the past few years molecularand morphological estimates of high-level centipedephylogeny have largely reached agreement apart from theplacement of one lineage the relictual order Craterostig-momorpha This is composed of two species in the genusCraterostigmus Pocock 1902 (Pocock 1902) with one speciesendemic to each of Tasmania and New Zealand (Edgecombeand Giribet 2008)

The morphological tree of centipedes reflects the groupsestablished in classifications devised more than a century ago

(Pocock 1902 Verhoeff 1902ndash1925) (fig 1A) This hypothesisrecognizes a fundamental division of Chilopoda intoNotostigmophora (composed of the single order Scutigero-morpha with ~100 species) and Pleurostigmophora whichgroups the other four living orders The names of the twogroups reflect a difference in the position of the respiratoryopenings either opening dorsally on the tergal plates (inNotostigmophora) or opening above the leg bases in thepleuron (in Pleurostigmophora) Pleurostigmophora in turndivides into Lithobiomorpha and a putative clade that groupsthe remaining three orders named Phylactometria based on ashared behavior of maternal care of the eggs and hatchlings(Edgecombe and Giribet 2004) Phylactometria consists ofCraterostigmus Scolopendromorpha and Geophilomorphawith the latter two orders almost universally united in agroup named Epimorpha based on their strictly epimorphicmode of development that is having a fixed number ofsegments throughout the course of postembryonicdevelopment

These relationships were initially defended using nonquan-titative methods applied to either diverse kinds of anatomicaland behavioral data (Dohle 1985 Borucki 1996) or detailedstudies of particular organ systems (Hilken 1997 Wirkner and

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom

1500 Mol Biol Evol 31(6)1500ndash1513 doi101093molbevmsu108 Advance Access publication March 26 2014

at Harvard L

ibrary on June 12 2014httpm

beoxfordjournalsorgD

ownloaded from

Pass 2002 Muller and Meyer-Rochow 2006) Numerical par-simony approaches supported this same cladogram usingeither a ground pattern coding for the five living orders(Shear and Bonamo 1988) or by coding a larger number ofexemplar species (Edgecombe et al 1999 Edgecombe andGiribet 2004 Giribet and Edgecombe 2006 Murienne et al2010) Other morphological hypotheses have also beenintroduced including a rerooted version of the standardtree (fig 1C) that infers a decrease rather than increase insegment numbers (Ax 2000)

The introduction of molecular data showed striking con-gruence with morphology with respect to most aspects ofcentipede phylogeny (Giribet et al 1999) although in somecases the molecular data alone posed conflict with respect tothe position of Craterostigmomorpha (eg Edgecombe et al1999 fig 2) Early analyses of nuclear protein-encoding genesshowed strong conflict with both ribosomal genes and mor-phology (Shultz and Regier 1997 Regier et al 2005 Giribetand Edgecombe 2006) The most recent molecular phyloge-nies using either a combination of nuclear ribosomal andmitochondrial loci (Murienne et al 2010) or larger compila-tions of nuclear protein-coding genes (Regier et al 2010)converge on trees that agree with morphology in mostrespects but present conflict on the Craterostigmus questionas was also found in an analysis based on nearly completenuclear ribosomal genes across arthropods and other ecdy-sozoans (Mallatt and Giribet 2006) (fig 1B) All these data

sources agree on the basal division of Chilopoda intoNotostigmophora and Pleurostigmophora but conflictremains over whether Lithobiomorpha (supported by mor-phology) or Craterostigmomorpha (supported by molecules)is sister group to the other members of PleurostigmophoraAs well the status of Epimorpha is unclear from existingmolecular data The group has been contradicted by aweakly supported alliance between Lithobiomorpha andScolopendromorpha (Murienne et al 2010) or has not beenfully tested because Geophilomorpha was unsampled (Regieret al 2010)

Here we apply a phylogenomic approach to resolve theseremaining controversies in centipede phylogeny We drawupon a much larger number of genes (1934 orthologs)than has been previously considered to re-evaluate the phy-logenetic position of Craterostigmus calibrate the centipedemolecular clock using modern methods for dating phyloge-nies and discuss the implications of the putative convergenceof morphological and behavioral characters such as maternalcare in centipedes The availability of genome and transcrip-tome data brings novel challenges to tree reconstruction (egcompositional heterogeneity and gene-tree incongruence)we demonstrate that though present in these data suchphenomena do not influence our major phylogeneticconclusions

Results

Transcriptome Assembly Isoform Filtering andOrthology Assignment

cDNA from the nine species used in this study was sequencedon the Illumina HiSeq 2500 platform (150 bp paired-endreads) A summary of the assembly statistics is shown intable 1 Following redundancy reduction open readingframe (ORF) prediction and selection of the longest ORFper putative unigene 9934ndash19621 peptide sequences pertaxon were retained Application of the OrthologousMAtrix (OMA) stand-alone algorithm (Altenhoff et al 20112013) grouped these peptides into a total of 30586 orthologsThree different supermatrices were constructed for the phy-logenetic analyses The first supermatrix was constructed byselecting only the orthologs present in six or more taxa Thenumber of orthologs per taxon in this supermatrix rangedfrom 131 to 1651 with 1934 orthologs in total The com-bined length of the aligned ortholog matrices (before prob-abilistic alignment masking of each individual ortholog withZORRO Wu et al 2012) was 770128 amino acid positionsConcatenation of individually ZORRO-masked orthologalignments yielded a supermatrix of 526145 amino acidsThe number of orthologs represented per taxon variedfrom 530 to 10070 (fig 2) As expected in all cases thelowest values corresponded to Archispirostreptus gigas asits transcriptome was pyrosequenced to relatively lowthroughput on the 454 Life Sciences platform (Meusemannet al 2010) However gene representation in the ingroupspecies of this study was relatively high varying from 1139to 1543 yielding less than 45 missing data To discern ifsubstitutional saturation and compositional heterogeneity

Scutigeromorpha

Lithobiomorpha

Craterostigmomorpha

Scolopendromorpha

Geophilomorpha

Scutigeromorpha

Lithobiomorpha

Craterostigmomorpha

Scolopendromorpha

Geophilomorpha

Scutigeromorpha

Lithobiomorpha

Craterostigmomorpha

Scolopendromorpha

Geophilomorpha

EP

IMO

RP

HA

PH

YLA

CTO

ME

TR

IA

PLE

UR

OS

TIG

MO

PH

OR

A

HE

TE

RO

TE

RG

A

TR

IAK

ON

TAP

OD

A

GO

NO

PO

DO

PH

OR

AE

PIM

OR

PH

A

AM

ALP

IGH

IATA

PLE

UR

OS

TIG

MO

PH

OR

A

A

B

C

FIG 1 Phylogenetic hypotheses of centipede relationships (A) Themorphologically supported tree (eg Pocock 1902) (B) Amalpighiatahypothesis as supported in early molecular analyses (Mallatt and Giribet2006) (C) Axrsquos (2000) morphological hypothesis with tree in (A)rerooted

1501

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

0606

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Scutigera coleoptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

10979688

09509289100

1---

1934 orthogroups

131 orthogroups

1001 orthogroups

1543 orthogroups

1199100

EP

IMO

RP

HA

AM

ALP

IGH

IATA

PLE

UR

OS

TIG

MO

PH

OR

A

CH

ILOP

OD

AArchispirostreptus gigas

FIG 2 Bayesian phylogenomic hypothesis of centipede interrelationships Nodal support values (posterior probabilityShimodaira-Hasegawa-likesupportbootstrapbootstrap) of the four different analyses (Bayesian inferenceML with PhyML-PCMAML with RAxMLML with RAxML for thereduced data set) are indicated in each case Ln L PhyML-PCMA 4768683119 Ln L RAxML 4762759821600 Ln L RAxML 799809006206Asterisks indicate support values of 11100100 The size of the circles is proportional to the number of orthologs in each species for the large data setInset indicates the 389-gene-tree topology (reduced data set) with myriapod monophyly

Table 1 Assembly Parameters

N RawReads

N ReadsAT

Discarded

Reads

NRMC NContigs

NBases(Mb)

AverageLength of

Contigs

MaximumContig

Length (bp)

N50 Total NOrthologs

N SelectedOrthologs

MissingData ()

Scutigeracoleoptrata

26072840 25870392 07 23546281 178854 952 5322 16337 684 9932 1472 245

Craterostigmustasmanianus

76698082 31313104 591 30257748 99546 646 6494 18029 964 7686 1430 26

Lithobius forficatus 77628252 75849229 22 72202535 74544 344 4619 11715 530 5495 1267 414

Strigamia maritima mdash mdash mdash mdash 24079 1762 119555 1344237 24745 7583 1543 174

Himantariumgabrielis

54681444 53535366 20 51089766 37200 218 5869 17609 782 5605 1139 454

Alipes grandidieri 32332034 30561861 54 28272411 138229 703 5089 9801 631 9441 1299 359

Cryptops hortensis 34611798 21731914 372 20594894 75753 455 6012 10232 803 8821 1274 391

Peripatopsiscapensis

40774042 40774042 0 mdash 92516 356 3848 10252 386 5148 1001 539

Tetranychus urticae mdash mdash mdash mdash 18313 mdash mdash mdash mdash 7775 683 773

Ixodes scapularis mdash mdash mdash mdash 20473 mdash mdash mdash mdash 6891 1368 274

Daphnia pulex mdash mdash mdash mdash 18989 1972 380007 4193030 49250 10070 1651 126

Archispirostreptusgigas

mdash mdash mdash mdash 4008 199 4958 768 560 530 131 963

NOTEmdashN number AT after thinning and trimming NRMC number of reads matched to contigs Percentage of missing data was calculated in relation to the final matrix(526145 amino acids) Total number of orthologs corresponds to all the orthologs recovered per taxon Number of selected orthologs is based on a minimum taxon occupancyof six (ie only the orthologs shared by at least six taxa were selected for the phylogenomic analyses) Asterisks for chelicerates indicate number of peptide sequences downloadedfrom annotated genome projects for analysis with OMA

1502

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were affecting our phylogenetic results we considered asecond supermatrix that included the 389 genes thatpassed individual tests of compositional heterogeneity andwhich furthermore fell in the least saturated quartile of thisstationary set (see Materials and Methods) Finally to reducethe percentage of missing data a smaller but more completesupermatrix was constructed by choosing a taxon occupancyof 11 (ie orthologs present in 11 or more taxa were selected)and removing A gigas to obviate entirely the effects of thelarge amount of missing data in this taxon This yielded amatrix composed of 61 orthologs with a gene occupancyper species ranging from 89 to 100 The total number ofamino acid sites after ZORRO-masked ortholog concatena-tion for this matrix was 13106

Centipede Phylogeny and Dating

All phylogenetic analyses of the three constructed superma-trices agree on the monophyly of Chilopoda Pleurostig-mophora and a clade composed of LithobiomorphaScolopendromorpha and Geophilomorpha thus placingCraterostigmomorpha as the sister group of the remainingpleurostigmophorans (fig 2 supplementary fig S1Supplementary Material online) Resampling support forChilopoda and for the clade containing LithobiomorphaScolopendromorpha and Geophilomorpha is absolute andis also high for Epimorpha and a clade composed ofCraterostigmomorpha and the remaining pleurostigmophor-ans in most analyses (fig 4)

Both the concatenated matrix overall and each taxonwithin it show evidence of substantial compositional hetero-geneity in a w2 test based on a simulated homogeneous nulldistribution (Foster 2004) (P values for all taxa were lt102)However a quartet supernetwork constructed from a morerestricted subset of genes that individually pass this compo-sitional heterogeneity test and which additionally fall into theleast saturated quartile of these stationary genes (Foster2004) demonstrates a network topology broadly comparable(but see later) with the tree topology of our concatenatedanalyses (fig 5) indicating that the compositional heteroge-neity though present is likely not driving the recoveredtopology

Furthermore although missing data are distributed in acomplex manner across the tree the number of genes poten-tially decisive for each internal node is high (eg gt800 for allnodes in Chilopoda in the 1934 gene analysis) indicating thatthe high support recovered in analyses of our largest matricesis not likely an artifact of the pattern of missing data(DellrsquoAmpio et al 2014 see also supplementary fig S1Supplementary Material online)

Our individual maximum likelihood (ML) gene-tree anal-yses display a complex pattern of topological discordance andmissing data Using quartet supernetworks we visualized thepredominant splits within these gene trees both for the com-plete set considered in our largest supermatrix and for thereduced set of compositionally homogeneous least-saturatedgenes (figs 4 and 5) Network topologies are similar in bothsets although there is a greater degree of conflict displayed

(particularly among outgroup species) in the reduced setperhaps the result of increased sampling error in this morelimited set of slowly evolving genes (Betancur-R et al 2014)Although both supernetworks display a tree-like pattern ofsplits reminiscent in many respects to the concatenated to-pologies (fig 2) they also both suggest the existence of con-siderable among-gene conflict in the relative position ofCraterostigmus with some genes suggesting a sister-grouprelationship with Scutigera in contrast to the concatenatedtopology We therefore quantified the number of genes con-gruent with either topology and also with a number of sa-lient hypotheses on centipede interrelationships (fig 6) Thesegene support frequencies largely favor the splits recovered inthe concatenated topology with for example 303 of genesgrouping Craterostigmomorpha and the remaining pleuros-tigmophorans and only 211 grouping Scutigeromorphaand Craterostigmomorpha However relationships withinthe clade formed by Lithobiomorpha and Epimorphaare somewhat more conflicted with 246 of genes sup-porting Epimorpha versus 213 supporting Lithobiomorpha + Geophilomorpha

The dated phylogenies for the 389 gene matrix undereither an autocorrelated log normal (fig 3 top) or uncorre-lated gamma model (fig 3 bottom) support a diversificationof Chilopoda between the Early Ordovician and the MiddleDevonian diversification of Pleurostigmophora in the MiddleOrdovician to Early Carboniferous and the diversification ofthe clade uniting Lithobiomorpha Scolopendromorpha andGeophilomorpha in the Silurian-Carboniferous The diversifi-cation of Epimorpha ranges from the Early Devonian to theearly Permian (fig 3)

Discussion

Phylogenomic Hypothesis of CentipedeInterrelationships

Resolving the Tree of Life has been prioritized as one of the125 most important unsolved scientific questions in 2005 byScience (Kennedy 2005) and the advent of phylogenomics hasaided in resolving many contentious aspects in animal phy-logeny (eg Philippe and Telford 2006 Dunn et al 2008Bleidorn et al 2009 Hejnol et al 2009 Meusemann et al2010 Kocot et al 2011 Rehm et al 2011 Smith et al 2011Struck et al 2011 Hartmann et al 2012 von Reumont et al2012) This is not without controversy and several phenom-ena unique to genome-scale data have been identified asnegatively impacting tree reconstruction in this paradigmperhaps foremost among these being gene occupancy (miss-ing data) (Roure et al 2013 DellrsquoAmpio et al 2014) taxonsampling (Pick et al 2010) and quality of data (eg properortholog assignment and controls on exogenous contamina-tion) (Philippe et al 2011 Salichos and Rokas 2011) In addi-tion assumptions underlying concatenation (Salichos andRokas 2013) and model (mis-)specification (Lartillot andPhilippe 2008) have also been identified as possible pitfallsfor tree reconstruction in a phylogenomic framework Finallyit has been shown that conflict may exist between classes ofgenes with some advocating exclusive usage of slowly

1503

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

evolving genes to resolve deep metazoan splits (eg Nosenkoet al 2013) although others believe that slow-evolving genesmay not be able to resolve deep splits due to the lack ofsufficient signal

We have worked carefully to try to address all these issuesin our analyses First we have minimized the amount of miss-ing data and worked with one of the most complete phylo-genomic matrices for nonmodel invertebrates based ongenomes and deeply sequenced transcriptomes with the ex-ception of the millipede outgroup transcriptome(Meusemann et al 2010) Although debate exists as towhether or not large amounts of missing data negativelyaffect phylogeny reconstruction (Hejnol et al 2009Lemmon et al 2009 DellrsquoAmpio et al 2014) Roure et al(2013) concluded that although the effects of missing datacan have a negative impact in certain situations other issueshave a more direct effect including modeling among-sitesubstitutional heterogeneity they recommended graphicallydisplaying the amount of missing data in phylogenetic treesas done for example by Hejnol et al (2009) or as presentedhere (fig 2) Given our low levels of missing data and high ma-trix completeness (supplementary table S2 Supplementary

Material online) it is unlikely that our results are affectedby gene occupancy or missing data Likewise taxon samplinghas been optimized to represent all major centipede lineagesNo major hypothesis on deep centipede phylogenetics re-mains untested with our sampling although it may be advis-able to add additional species for each order in future studiesespecially of Scutigeromorpha and Lithobiomorpha whosepositions show incomplete support or substantial among-gene conflict perhaps because they are represented by asingle species each (figs 4ndash6)

It has been argued that in the case of strong gene-treeincongruence (eg from incomplete lineage sorting horizon-tal gene transfer or cryptic duplicationloss) concatenatedanalysis may be misleading causing an erroneous species treeto be inferred and furthermore with inflated resampling sup-port giving no indication of intergene conflict (eg Jeffroyet al 2006 Kubatko and Degnan 2007 Nosenko et al 2013Salichos and Rokas 2013) Many therefore question exclusivereliance on concatenation and instead argue for the applica-tion of a model of the source of incongruence (eg the emerg-ing species tree-methods based on the multispeciescoalescent Edwards 2009) or for discarding data by selecting

CenozoicMesozoicPaleozoicPrecambrian

C O S D C P T J K P N Q

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

0100200300400500600

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

FIG 3 Chronogram of centipede evolution for the 389-gene data set with 95 highest posterior density (HPD) bar for the dating under autocorrelatedlog normal (top) and uncorrelated gamma model (bottom) Nodes that were calibrated with fossils are indicated with a star

1504

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genes with strong phylogenetic signal and demonstrated ab-sence of significant incongruence when reconstructing an-cient divergences To address this possibility we alsoadopted a gene-tree perspective inferring individual MLtrees for each gene included in our concatenated matrixHowever many available methods of investigating gene-treeincongruence (including recent metrics such as internodecertainty and its derivatives Salichos and Rokas 2013Salichos et al 2014) can only be calculated for genes thatcontain the same set of taxa an assumption that our dataset (and many similar data sets generated by highly parallelsequencing) assuredly violates

We therefore visualized the dominant bipartitions amongthese gene trees by constructing a supernetwork (a general-ization of supertrees permitting reticulation where intergeneconflicts exist) using the SuperQ method (Grunewald et al2013) which decomposes all gene trees into quartets and

then infers a supernetwork from these quartets assigningedge lengths by examining quartet frequencies We inferredsupernetworks from both ML gene trees (figs 4 and 5) andthe majority rule consensus (MRC) trees from the bootstrapreplicates of each gene (figure not shown) These supernet-works display a predominantly tree-like structure topologi-cally resembling our concatenated species trees howeverthey indicate the presence of intergene conflict in the relativepositions of Craterostigmomorpha and Scutigeromorphawith some genes uniting these taxa as a clade in contrastto our species tree (figs 4 and 5) Interestingly the sameoverarching network structure is present in both the ML-tree-based and bootstrap MRC-based networks indicatingthat these conflicts are not merely the result of stochasticsampling error To examine how these phylogenetic conflictsare distributed among our input genes we employedConclustador which automatically assigns genes into

001Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes

scapularis

Tetranychus

urticae

Daphnia pulexStrigamia

maritima

Himantarium

gabrielis

Lithobius

Cryptops

hortensis

Alipes

grandidieri

FIG 5 Supernetwork representation of quartets derived from individual ML gene trees considering only 389 genes showing evidence of compositionalhomogeneity and relatively low substitutional saturation

Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes scapularis

Tetranychus urticae

Daphnia pulex

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

001

FIG 4 Supernetwork representation of quartets derived from individual ML gene trees for all 1934 genes concatenated in the supermatrix presented infigure 2 Phylogenetic conflict is represented by nontree-like splits (eg ScutigeromorphandashCraterostigmomorpha)

1505

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 2: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

Pass 2002 Muller and Meyer-Rochow 2006) Numerical par-simony approaches supported this same cladogram usingeither a ground pattern coding for the five living orders(Shear and Bonamo 1988) or by coding a larger number ofexemplar species (Edgecombe et al 1999 Edgecombe andGiribet 2004 Giribet and Edgecombe 2006 Murienne et al2010) Other morphological hypotheses have also beenintroduced including a rerooted version of the standardtree (fig 1C) that infers a decrease rather than increase insegment numbers (Ax 2000)

The introduction of molecular data showed striking con-gruence with morphology with respect to most aspects ofcentipede phylogeny (Giribet et al 1999) although in somecases the molecular data alone posed conflict with respect tothe position of Craterostigmomorpha (eg Edgecombe et al1999 fig 2) Early analyses of nuclear protein-encoding genesshowed strong conflict with both ribosomal genes and mor-phology (Shultz and Regier 1997 Regier et al 2005 Giribetand Edgecombe 2006) The most recent molecular phyloge-nies using either a combination of nuclear ribosomal andmitochondrial loci (Murienne et al 2010) or larger compila-tions of nuclear protein-coding genes (Regier et al 2010)converge on trees that agree with morphology in mostrespects but present conflict on the Craterostigmus questionas was also found in an analysis based on nearly completenuclear ribosomal genes across arthropods and other ecdy-sozoans (Mallatt and Giribet 2006) (fig 1B) All these data

sources agree on the basal division of Chilopoda intoNotostigmophora and Pleurostigmophora but conflictremains over whether Lithobiomorpha (supported by mor-phology) or Craterostigmomorpha (supported by molecules)is sister group to the other members of PleurostigmophoraAs well the status of Epimorpha is unclear from existingmolecular data The group has been contradicted by aweakly supported alliance between Lithobiomorpha andScolopendromorpha (Murienne et al 2010) or has not beenfully tested because Geophilomorpha was unsampled (Regieret al 2010)

Here we apply a phylogenomic approach to resolve theseremaining controversies in centipede phylogeny We drawupon a much larger number of genes (1934 orthologs)than has been previously considered to re-evaluate the phy-logenetic position of Craterostigmus calibrate the centipedemolecular clock using modern methods for dating phyloge-nies and discuss the implications of the putative convergenceof morphological and behavioral characters such as maternalcare in centipedes The availability of genome and transcrip-tome data brings novel challenges to tree reconstruction (egcompositional heterogeneity and gene-tree incongruence)we demonstrate that though present in these data suchphenomena do not influence our major phylogeneticconclusions

Results

Transcriptome Assembly Isoform Filtering andOrthology Assignment

cDNA from the nine species used in this study was sequencedon the Illumina HiSeq 2500 platform (150 bp paired-endreads) A summary of the assembly statistics is shown intable 1 Following redundancy reduction open readingframe (ORF) prediction and selection of the longest ORFper putative unigene 9934ndash19621 peptide sequences pertaxon were retained Application of the OrthologousMAtrix (OMA) stand-alone algorithm (Altenhoff et al 20112013) grouped these peptides into a total of 30586 orthologsThree different supermatrices were constructed for the phy-logenetic analyses The first supermatrix was constructed byselecting only the orthologs present in six or more taxa Thenumber of orthologs per taxon in this supermatrix rangedfrom 131 to 1651 with 1934 orthologs in total The com-bined length of the aligned ortholog matrices (before prob-abilistic alignment masking of each individual ortholog withZORRO Wu et al 2012) was 770128 amino acid positionsConcatenation of individually ZORRO-masked orthologalignments yielded a supermatrix of 526145 amino acidsThe number of orthologs represented per taxon variedfrom 530 to 10070 (fig 2) As expected in all cases thelowest values corresponded to Archispirostreptus gigas asits transcriptome was pyrosequenced to relatively lowthroughput on the 454 Life Sciences platform (Meusemannet al 2010) However gene representation in the ingroupspecies of this study was relatively high varying from 1139to 1543 yielding less than 45 missing data To discern ifsubstitutional saturation and compositional heterogeneity

Scutigeromorpha

Lithobiomorpha

Craterostigmomorpha

Scolopendromorpha

Geophilomorpha

Scutigeromorpha

Lithobiomorpha

Craterostigmomorpha

Scolopendromorpha

Geophilomorpha

Scutigeromorpha

Lithobiomorpha

Craterostigmomorpha

Scolopendromorpha

Geophilomorpha

EP

IMO

RP

HA

PH

YLA

CTO

ME

TR

IA

PLE

UR

OS

TIG

MO

PH

OR

A

HE

TE

RO

TE

RG

A

TR

IAK

ON

TAP

OD

A

GO

NO

PO

DO

PH

OR

AE

PIM

OR

PH

A

AM

ALP

IGH

IATA

PLE

UR

OS

TIG

MO

PH

OR

A

A

B

C

FIG 1 Phylogenetic hypotheses of centipede relationships (A) Themorphologically supported tree (eg Pocock 1902) (B) Amalpighiatahypothesis as supported in early molecular analyses (Mallatt and Giribet2006) (C) Axrsquos (2000) morphological hypothesis with tree in (A)rerooted

1501

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

0606

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Scutigera coleoptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

10979688

09509289100

1---

1934 orthogroups

131 orthogroups

1001 orthogroups

1543 orthogroups

1199100

EP

IMO

RP

HA

AM

ALP

IGH

IATA

PLE

UR

OS

TIG

MO

PH

OR

A

CH

ILOP

OD

AArchispirostreptus gigas

FIG 2 Bayesian phylogenomic hypothesis of centipede interrelationships Nodal support values (posterior probabilityShimodaira-Hasegawa-likesupportbootstrapbootstrap) of the four different analyses (Bayesian inferenceML with PhyML-PCMAML with RAxMLML with RAxML for thereduced data set) are indicated in each case Ln L PhyML-PCMA 4768683119 Ln L RAxML 4762759821600 Ln L RAxML 799809006206Asterisks indicate support values of 11100100 The size of the circles is proportional to the number of orthologs in each species for the large data setInset indicates the 389-gene-tree topology (reduced data set) with myriapod monophyly

Table 1 Assembly Parameters

N RawReads

N ReadsAT

Discarded

Reads

NRMC NContigs

NBases(Mb)

AverageLength of

Contigs

MaximumContig

Length (bp)

N50 Total NOrthologs

N SelectedOrthologs

MissingData ()

Scutigeracoleoptrata

26072840 25870392 07 23546281 178854 952 5322 16337 684 9932 1472 245

Craterostigmustasmanianus

76698082 31313104 591 30257748 99546 646 6494 18029 964 7686 1430 26

Lithobius forficatus 77628252 75849229 22 72202535 74544 344 4619 11715 530 5495 1267 414

Strigamia maritima mdash mdash mdash mdash 24079 1762 119555 1344237 24745 7583 1543 174

Himantariumgabrielis

54681444 53535366 20 51089766 37200 218 5869 17609 782 5605 1139 454

Alipes grandidieri 32332034 30561861 54 28272411 138229 703 5089 9801 631 9441 1299 359

Cryptops hortensis 34611798 21731914 372 20594894 75753 455 6012 10232 803 8821 1274 391

Peripatopsiscapensis

40774042 40774042 0 mdash 92516 356 3848 10252 386 5148 1001 539

Tetranychus urticae mdash mdash mdash mdash 18313 mdash mdash mdash mdash 7775 683 773

Ixodes scapularis mdash mdash mdash mdash 20473 mdash mdash mdash mdash 6891 1368 274

Daphnia pulex mdash mdash mdash mdash 18989 1972 380007 4193030 49250 10070 1651 126

Archispirostreptusgigas

mdash mdash mdash mdash 4008 199 4958 768 560 530 131 963

NOTEmdashN number AT after thinning and trimming NRMC number of reads matched to contigs Percentage of missing data was calculated in relation to the final matrix(526145 amino acids) Total number of orthologs corresponds to all the orthologs recovered per taxon Number of selected orthologs is based on a minimum taxon occupancyof six (ie only the orthologs shared by at least six taxa were selected for the phylogenomic analyses) Asterisks for chelicerates indicate number of peptide sequences downloadedfrom annotated genome projects for analysis with OMA

1502

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were affecting our phylogenetic results we considered asecond supermatrix that included the 389 genes thatpassed individual tests of compositional heterogeneity andwhich furthermore fell in the least saturated quartile of thisstationary set (see Materials and Methods) Finally to reducethe percentage of missing data a smaller but more completesupermatrix was constructed by choosing a taxon occupancyof 11 (ie orthologs present in 11 or more taxa were selected)and removing A gigas to obviate entirely the effects of thelarge amount of missing data in this taxon This yielded amatrix composed of 61 orthologs with a gene occupancyper species ranging from 89 to 100 The total number ofamino acid sites after ZORRO-masked ortholog concatena-tion for this matrix was 13106

Centipede Phylogeny and Dating

All phylogenetic analyses of the three constructed superma-trices agree on the monophyly of Chilopoda Pleurostig-mophora and a clade composed of LithobiomorphaScolopendromorpha and Geophilomorpha thus placingCraterostigmomorpha as the sister group of the remainingpleurostigmophorans (fig 2 supplementary fig S1Supplementary Material online) Resampling support forChilopoda and for the clade containing LithobiomorphaScolopendromorpha and Geophilomorpha is absolute andis also high for Epimorpha and a clade composed ofCraterostigmomorpha and the remaining pleurostigmophor-ans in most analyses (fig 4)

Both the concatenated matrix overall and each taxonwithin it show evidence of substantial compositional hetero-geneity in a w2 test based on a simulated homogeneous nulldistribution (Foster 2004) (P values for all taxa were lt102)However a quartet supernetwork constructed from a morerestricted subset of genes that individually pass this compo-sitional heterogeneity test and which additionally fall into theleast saturated quartile of these stationary genes (Foster2004) demonstrates a network topology broadly comparable(but see later) with the tree topology of our concatenatedanalyses (fig 5) indicating that the compositional heteroge-neity though present is likely not driving the recoveredtopology

Furthermore although missing data are distributed in acomplex manner across the tree the number of genes poten-tially decisive for each internal node is high (eg gt800 for allnodes in Chilopoda in the 1934 gene analysis) indicating thatthe high support recovered in analyses of our largest matricesis not likely an artifact of the pattern of missing data(DellrsquoAmpio et al 2014 see also supplementary fig S1Supplementary Material online)

Our individual maximum likelihood (ML) gene-tree anal-yses display a complex pattern of topological discordance andmissing data Using quartet supernetworks we visualized thepredominant splits within these gene trees both for the com-plete set considered in our largest supermatrix and for thereduced set of compositionally homogeneous least-saturatedgenes (figs 4 and 5) Network topologies are similar in bothsets although there is a greater degree of conflict displayed

(particularly among outgroup species) in the reduced setperhaps the result of increased sampling error in this morelimited set of slowly evolving genes (Betancur-R et al 2014)Although both supernetworks display a tree-like pattern ofsplits reminiscent in many respects to the concatenated to-pologies (fig 2) they also both suggest the existence of con-siderable among-gene conflict in the relative position ofCraterostigmus with some genes suggesting a sister-grouprelationship with Scutigera in contrast to the concatenatedtopology We therefore quantified the number of genes con-gruent with either topology and also with a number of sa-lient hypotheses on centipede interrelationships (fig 6) Thesegene support frequencies largely favor the splits recovered inthe concatenated topology with for example 303 of genesgrouping Craterostigmomorpha and the remaining pleuros-tigmophorans and only 211 grouping Scutigeromorphaand Craterostigmomorpha However relationships withinthe clade formed by Lithobiomorpha and Epimorphaare somewhat more conflicted with 246 of genes sup-porting Epimorpha versus 213 supporting Lithobiomorpha + Geophilomorpha

The dated phylogenies for the 389 gene matrix undereither an autocorrelated log normal (fig 3 top) or uncorre-lated gamma model (fig 3 bottom) support a diversificationof Chilopoda between the Early Ordovician and the MiddleDevonian diversification of Pleurostigmophora in the MiddleOrdovician to Early Carboniferous and the diversification ofthe clade uniting Lithobiomorpha Scolopendromorpha andGeophilomorpha in the Silurian-Carboniferous The diversifi-cation of Epimorpha ranges from the Early Devonian to theearly Permian (fig 3)

Discussion

Phylogenomic Hypothesis of CentipedeInterrelationships

Resolving the Tree of Life has been prioritized as one of the125 most important unsolved scientific questions in 2005 byScience (Kennedy 2005) and the advent of phylogenomics hasaided in resolving many contentious aspects in animal phy-logeny (eg Philippe and Telford 2006 Dunn et al 2008Bleidorn et al 2009 Hejnol et al 2009 Meusemann et al2010 Kocot et al 2011 Rehm et al 2011 Smith et al 2011Struck et al 2011 Hartmann et al 2012 von Reumont et al2012) This is not without controversy and several phenom-ena unique to genome-scale data have been identified asnegatively impacting tree reconstruction in this paradigmperhaps foremost among these being gene occupancy (miss-ing data) (Roure et al 2013 DellrsquoAmpio et al 2014) taxonsampling (Pick et al 2010) and quality of data (eg properortholog assignment and controls on exogenous contamina-tion) (Philippe et al 2011 Salichos and Rokas 2011) In addi-tion assumptions underlying concatenation (Salichos andRokas 2013) and model (mis-)specification (Lartillot andPhilippe 2008) have also been identified as possible pitfallsfor tree reconstruction in a phylogenomic framework Finallyit has been shown that conflict may exist between classes ofgenes with some advocating exclusive usage of slowly

1503

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

evolving genes to resolve deep metazoan splits (eg Nosenkoet al 2013) although others believe that slow-evolving genesmay not be able to resolve deep splits due to the lack ofsufficient signal

We have worked carefully to try to address all these issuesin our analyses First we have minimized the amount of miss-ing data and worked with one of the most complete phylo-genomic matrices for nonmodel invertebrates based ongenomes and deeply sequenced transcriptomes with the ex-ception of the millipede outgroup transcriptome(Meusemann et al 2010) Although debate exists as towhether or not large amounts of missing data negativelyaffect phylogeny reconstruction (Hejnol et al 2009Lemmon et al 2009 DellrsquoAmpio et al 2014) Roure et al(2013) concluded that although the effects of missing datacan have a negative impact in certain situations other issueshave a more direct effect including modeling among-sitesubstitutional heterogeneity they recommended graphicallydisplaying the amount of missing data in phylogenetic treesas done for example by Hejnol et al (2009) or as presentedhere (fig 2) Given our low levels of missing data and high ma-trix completeness (supplementary table S2 Supplementary

Material online) it is unlikely that our results are affectedby gene occupancy or missing data Likewise taxon samplinghas been optimized to represent all major centipede lineagesNo major hypothesis on deep centipede phylogenetics re-mains untested with our sampling although it may be advis-able to add additional species for each order in future studiesespecially of Scutigeromorpha and Lithobiomorpha whosepositions show incomplete support or substantial among-gene conflict perhaps because they are represented by asingle species each (figs 4ndash6)

It has been argued that in the case of strong gene-treeincongruence (eg from incomplete lineage sorting horizon-tal gene transfer or cryptic duplicationloss) concatenatedanalysis may be misleading causing an erroneous species treeto be inferred and furthermore with inflated resampling sup-port giving no indication of intergene conflict (eg Jeffroyet al 2006 Kubatko and Degnan 2007 Nosenko et al 2013Salichos and Rokas 2013) Many therefore question exclusivereliance on concatenation and instead argue for the applica-tion of a model of the source of incongruence (eg the emerg-ing species tree-methods based on the multispeciescoalescent Edwards 2009) or for discarding data by selecting

CenozoicMesozoicPaleozoicPrecambrian

C O S D C P T J K P N Q

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

0100200300400500600

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

FIG 3 Chronogram of centipede evolution for the 389-gene data set with 95 highest posterior density (HPD) bar for the dating under autocorrelatedlog normal (top) and uncorrelated gamma model (bottom) Nodes that were calibrated with fossils are indicated with a star

1504

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genes with strong phylogenetic signal and demonstrated ab-sence of significant incongruence when reconstructing an-cient divergences To address this possibility we alsoadopted a gene-tree perspective inferring individual MLtrees for each gene included in our concatenated matrixHowever many available methods of investigating gene-treeincongruence (including recent metrics such as internodecertainty and its derivatives Salichos and Rokas 2013Salichos et al 2014) can only be calculated for genes thatcontain the same set of taxa an assumption that our dataset (and many similar data sets generated by highly parallelsequencing) assuredly violates

We therefore visualized the dominant bipartitions amongthese gene trees by constructing a supernetwork (a general-ization of supertrees permitting reticulation where intergeneconflicts exist) using the SuperQ method (Grunewald et al2013) which decomposes all gene trees into quartets and

then infers a supernetwork from these quartets assigningedge lengths by examining quartet frequencies We inferredsupernetworks from both ML gene trees (figs 4 and 5) andthe majority rule consensus (MRC) trees from the bootstrapreplicates of each gene (figure not shown) These supernet-works display a predominantly tree-like structure topologi-cally resembling our concatenated species trees howeverthey indicate the presence of intergene conflict in the relativepositions of Craterostigmomorpha and Scutigeromorphawith some genes uniting these taxa as a clade in contrastto our species tree (figs 4 and 5) Interestingly the sameoverarching network structure is present in both the ML-tree-based and bootstrap MRC-based networks indicatingthat these conflicts are not merely the result of stochasticsampling error To examine how these phylogenetic conflictsare distributed among our input genes we employedConclustador which automatically assigns genes into

001Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes

scapularis

Tetranychus

urticae

Daphnia pulexStrigamia

maritima

Himantarium

gabrielis

Lithobius

Cryptops

hortensis

Alipes

grandidieri

FIG 5 Supernetwork representation of quartets derived from individual ML gene trees considering only 389 genes showing evidence of compositionalhomogeneity and relatively low substitutional saturation

Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes scapularis

Tetranychus urticae

Daphnia pulex

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

001

FIG 4 Supernetwork representation of quartets derived from individual ML gene trees for all 1934 genes concatenated in the supermatrix presented infigure 2 Phylogenetic conflict is represented by nontree-like splits (eg ScutigeromorphandashCraterostigmomorpha)

1505

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 3: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

0606

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Scutigera coleoptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

10979688

09509289100

1---

1934 orthogroups

131 orthogroups

1001 orthogroups

1543 orthogroups

1199100

EP

IMO

RP

HA

AM

ALP

IGH

IATA

PLE

UR

OS

TIG

MO

PH

OR

A

CH

ILOP

OD

AArchispirostreptus gigas

FIG 2 Bayesian phylogenomic hypothesis of centipede interrelationships Nodal support values (posterior probabilityShimodaira-Hasegawa-likesupportbootstrapbootstrap) of the four different analyses (Bayesian inferenceML with PhyML-PCMAML with RAxMLML with RAxML for thereduced data set) are indicated in each case Ln L PhyML-PCMA 4768683119 Ln L RAxML 4762759821600 Ln L RAxML 799809006206Asterisks indicate support values of 11100100 The size of the circles is proportional to the number of orthologs in each species for the large data setInset indicates the 389-gene-tree topology (reduced data set) with myriapod monophyly

Table 1 Assembly Parameters

N RawReads

N ReadsAT

Discarded

Reads

NRMC NContigs

NBases(Mb)

AverageLength of

Contigs

MaximumContig

Length (bp)

N50 Total NOrthologs

N SelectedOrthologs

MissingData ()

Scutigeracoleoptrata

26072840 25870392 07 23546281 178854 952 5322 16337 684 9932 1472 245

Craterostigmustasmanianus

76698082 31313104 591 30257748 99546 646 6494 18029 964 7686 1430 26

Lithobius forficatus 77628252 75849229 22 72202535 74544 344 4619 11715 530 5495 1267 414

Strigamia maritima mdash mdash mdash mdash 24079 1762 119555 1344237 24745 7583 1543 174

Himantariumgabrielis

54681444 53535366 20 51089766 37200 218 5869 17609 782 5605 1139 454

Alipes grandidieri 32332034 30561861 54 28272411 138229 703 5089 9801 631 9441 1299 359

Cryptops hortensis 34611798 21731914 372 20594894 75753 455 6012 10232 803 8821 1274 391

Peripatopsiscapensis

40774042 40774042 0 mdash 92516 356 3848 10252 386 5148 1001 539

Tetranychus urticae mdash mdash mdash mdash 18313 mdash mdash mdash mdash 7775 683 773

Ixodes scapularis mdash mdash mdash mdash 20473 mdash mdash mdash mdash 6891 1368 274

Daphnia pulex mdash mdash mdash mdash 18989 1972 380007 4193030 49250 10070 1651 126

Archispirostreptusgigas

mdash mdash mdash mdash 4008 199 4958 768 560 530 131 963

NOTEmdashN number AT after thinning and trimming NRMC number of reads matched to contigs Percentage of missing data was calculated in relation to the final matrix(526145 amino acids) Total number of orthologs corresponds to all the orthologs recovered per taxon Number of selected orthologs is based on a minimum taxon occupancyof six (ie only the orthologs shared by at least six taxa were selected for the phylogenomic analyses) Asterisks for chelicerates indicate number of peptide sequences downloadedfrom annotated genome projects for analysis with OMA

1502

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

were affecting our phylogenetic results we considered asecond supermatrix that included the 389 genes thatpassed individual tests of compositional heterogeneity andwhich furthermore fell in the least saturated quartile of thisstationary set (see Materials and Methods) Finally to reducethe percentage of missing data a smaller but more completesupermatrix was constructed by choosing a taxon occupancyof 11 (ie orthologs present in 11 or more taxa were selected)and removing A gigas to obviate entirely the effects of thelarge amount of missing data in this taxon This yielded amatrix composed of 61 orthologs with a gene occupancyper species ranging from 89 to 100 The total number ofamino acid sites after ZORRO-masked ortholog concatena-tion for this matrix was 13106

Centipede Phylogeny and Dating

All phylogenetic analyses of the three constructed superma-trices agree on the monophyly of Chilopoda Pleurostig-mophora and a clade composed of LithobiomorphaScolopendromorpha and Geophilomorpha thus placingCraterostigmomorpha as the sister group of the remainingpleurostigmophorans (fig 2 supplementary fig S1Supplementary Material online) Resampling support forChilopoda and for the clade containing LithobiomorphaScolopendromorpha and Geophilomorpha is absolute andis also high for Epimorpha and a clade composed ofCraterostigmomorpha and the remaining pleurostigmophor-ans in most analyses (fig 4)

Both the concatenated matrix overall and each taxonwithin it show evidence of substantial compositional hetero-geneity in a w2 test based on a simulated homogeneous nulldistribution (Foster 2004) (P values for all taxa were lt102)However a quartet supernetwork constructed from a morerestricted subset of genes that individually pass this compo-sitional heterogeneity test and which additionally fall into theleast saturated quartile of these stationary genes (Foster2004) demonstrates a network topology broadly comparable(but see later) with the tree topology of our concatenatedanalyses (fig 5) indicating that the compositional heteroge-neity though present is likely not driving the recoveredtopology

Furthermore although missing data are distributed in acomplex manner across the tree the number of genes poten-tially decisive for each internal node is high (eg gt800 for allnodes in Chilopoda in the 1934 gene analysis) indicating thatthe high support recovered in analyses of our largest matricesis not likely an artifact of the pattern of missing data(DellrsquoAmpio et al 2014 see also supplementary fig S1Supplementary Material online)

Our individual maximum likelihood (ML) gene-tree anal-yses display a complex pattern of topological discordance andmissing data Using quartet supernetworks we visualized thepredominant splits within these gene trees both for the com-plete set considered in our largest supermatrix and for thereduced set of compositionally homogeneous least-saturatedgenes (figs 4 and 5) Network topologies are similar in bothsets although there is a greater degree of conflict displayed

(particularly among outgroup species) in the reduced setperhaps the result of increased sampling error in this morelimited set of slowly evolving genes (Betancur-R et al 2014)Although both supernetworks display a tree-like pattern ofsplits reminiscent in many respects to the concatenated to-pologies (fig 2) they also both suggest the existence of con-siderable among-gene conflict in the relative position ofCraterostigmus with some genes suggesting a sister-grouprelationship with Scutigera in contrast to the concatenatedtopology We therefore quantified the number of genes con-gruent with either topology and also with a number of sa-lient hypotheses on centipede interrelationships (fig 6) Thesegene support frequencies largely favor the splits recovered inthe concatenated topology with for example 303 of genesgrouping Craterostigmomorpha and the remaining pleuros-tigmophorans and only 211 grouping Scutigeromorphaand Craterostigmomorpha However relationships withinthe clade formed by Lithobiomorpha and Epimorphaare somewhat more conflicted with 246 of genes sup-porting Epimorpha versus 213 supporting Lithobiomorpha + Geophilomorpha

The dated phylogenies for the 389 gene matrix undereither an autocorrelated log normal (fig 3 top) or uncorre-lated gamma model (fig 3 bottom) support a diversificationof Chilopoda between the Early Ordovician and the MiddleDevonian diversification of Pleurostigmophora in the MiddleOrdovician to Early Carboniferous and the diversification ofthe clade uniting Lithobiomorpha Scolopendromorpha andGeophilomorpha in the Silurian-Carboniferous The diversifi-cation of Epimorpha ranges from the Early Devonian to theearly Permian (fig 3)

Discussion

Phylogenomic Hypothesis of CentipedeInterrelationships

Resolving the Tree of Life has been prioritized as one of the125 most important unsolved scientific questions in 2005 byScience (Kennedy 2005) and the advent of phylogenomics hasaided in resolving many contentious aspects in animal phy-logeny (eg Philippe and Telford 2006 Dunn et al 2008Bleidorn et al 2009 Hejnol et al 2009 Meusemann et al2010 Kocot et al 2011 Rehm et al 2011 Smith et al 2011Struck et al 2011 Hartmann et al 2012 von Reumont et al2012) This is not without controversy and several phenom-ena unique to genome-scale data have been identified asnegatively impacting tree reconstruction in this paradigmperhaps foremost among these being gene occupancy (miss-ing data) (Roure et al 2013 DellrsquoAmpio et al 2014) taxonsampling (Pick et al 2010) and quality of data (eg properortholog assignment and controls on exogenous contamina-tion) (Philippe et al 2011 Salichos and Rokas 2011) In addi-tion assumptions underlying concatenation (Salichos andRokas 2013) and model (mis-)specification (Lartillot andPhilippe 2008) have also been identified as possible pitfallsfor tree reconstruction in a phylogenomic framework Finallyit has been shown that conflict may exist between classes ofgenes with some advocating exclusive usage of slowly

1503

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

evolving genes to resolve deep metazoan splits (eg Nosenkoet al 2013) although others believe that slow-evolving genesmay not be able to resolve deep splits due to the lack ofsufficient signal

We have worked carefully to try to address all these issuesin our analyses First we have minimized the amount of miss-ing data and worked with one of the most complete phylo-genomic matrices for nonmodel invertebrates based ongenomes and deeply sequenced transcriptomes with the ex-ception of the millipede outgroup transcriptome(Meusemann et al 2010) Although debate exists as towhether or not large amounts of missing data negativelyaffect phylogeny reconstruction (Hejnol et al 2009Lemmon et al 2009 DellrsquoAmpio et al 2014) Roure et al(2013) concluded that although the effects of missing datacan have a negative impact in certain situations other issueshave a more direct effect including modeling among-sitesubstitutional heterogeneity they recommended graphicallydisplaying the amount of missing data in phylogenetic treesas done for example by Hejnol et al (2009) or as presentedhere (fig 2) Given our low levels of missing data and high ma-trix completeness (supplementary table S2 Supplementary

Material online) it is unlikely that our results are affectedby gene occupancy or missing data Likewise taxon samplinghas been optimized to represent all major centipede lineagesNo major hypothesis on deep centipede phylogenetics re-mains untested with our sampling although it may be advis-able to add additional species for each order in future studiesespecially of Scutigeromorpha and Lithobiomorpha whosepositions show incomplete support or substantial among-gene conflict perhaps because they are represented by asingle species each (figs 4ndash6)

It has been argued that in the case of strong gene-treeincongruence (eg from incomplete lineage sorting horizon-tal gene transfer or cryptic duplicationloss) concatenatedanalysis may be misleading causing an erroneous species treeto be inferred and furthermore with inflated resampling sup-port giving no indication of intergene conflict (eg Jeffroyet al 2006 Kubatko and Degnan 2007 Nosenko et al 2013Salichos and Rokas 2013) Many therefore question exclusivereliance on concatenation and instead argue for the applica-tion of a model of the source of incongruence (eg the emerg-ing species tree-methods based on the multispeciescoalescent Edwards 2009) or for discarding data by selecting

CenozoicMesozoicPaleozoicPrecambrian

C O S D C P T J K P N Q

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

0100200300400500600

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

FIG 3 Chronogram of centipede evolution for the 389-gene data set with 95 highest posterior density (HPD) bar for the dating under autocorrelatedlog normal (top) and uncorrelated gamma model (bottom) Nodes that were calibrated with fossils are indicated with a star

1504

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genes with strong phylogenetic signal and demonstrated ab-sence of significant incongruence when reconstructing an-cient divergences To address this possibility we alsoadopted a gene-tree perspective inferring individual MLtrees for each gene included in our concatenated matrixHowever many available methods of investigating gene-treeincongruence (including recent metrics such as internodecertainty and its derivatives Salichos and Rokas 2013Salichos et al 2014) can only be calculated for genes thatcontain the same set of taxa an assumption that our dataset (and many similar data sets generated by highly parallelsequencing) assuredly violates

We therefore visualized the dominant bipartitions amongthese gene trees by constructing a supernetwork (a general-ization of supertrees permitting reticulation where intergeneconflicts exist) using the SuperQ method (Grunewald et al2013) which decomposes all gene trees into quartets and

then infers a supernetwork from these quartets assigningedge lengths by examining quartet frequencies We inferredsupernetworks from both ML gene trees (figs 4 and 5) andthe majority rule consensus (MRC) trees from the bootstrapreplicates of each gene (figure not shown) These supernet-works display a predominantly tree-like structure topologi-cally resembling our concatenated species trees howeverthey indicate the presence of intergene conflict in the relativepositions of Craterostigmomorpha and Scutigeromorphawith some genes uniting these taxa as a clade in contrastto our species tree (figs 4 and 5) Interestingly the sameoverarching network structure is present in both the ML-tree-based and bootstrap MRC-based networks indicatingthat these conflicts are not merely the result of stochasticsampling error To examine how these phylogenetic conflictsare distributed among our input genes we employedConclustador which automatically assigns genes into

001Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes

scapularis

Tetranychus

urticae

Daphnia pulexStrigamia

maritima

Himantarium

gabrielis

Lithobius

Cryptops

hortensis

Alipes

grandidieri

FIG 5 Supernetwork representation of quartets derived from individual ML gene trees considering only 389 genes showing evidence of compositionalhomogeneity and relatively low substitutional saturation

Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes scapularis

Tetranychus urticae

Daphnia pulex

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

001

FIG 4 Supernetwork representation of quartets derived from individual ML gene trees for all 1934 genes concatenated in the supermatrix presented infigure 2 Phylogenetic conflict is represented by nontree-like splits (eg ScutigeromorphandashCraterostigmomorpha)

1505

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 4: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

were affecting our phylogenetic results we considered asecond supermatrix that included the 389 genes thatpassed individual tests of compositional heterogeneity andwhich furthermore fell in the least saturated quartile of thisstationary set (see Materials and Methods) Finally to reducethe percentage of missing data a smaller but more completesupermatrix was constructed by choosing a taxon occupancyof 11 (ie orthologs present in 11 or more taxa were selected)and removing A gigas to obviate entirely the effects of thelarge amount of missing data in this taxon This yielded amatrix composed of 61 orthologs with a gene occupancyper species ranging from 89 to 100 The total number ofamino acid sites after ZORRO-masked ortholog concatena-tion for this matrix was 13106

Centipede Phylogeny and Dating

All phylogenetic analyses of the three constructed superma-trices agree on the monophyly of Chilopoda Pleurostig-mophora and a clade composed of LithobiomorphaScolopendromorpha and Geophilomorpha thus placingCraterostigmomorpha as the sister group of the remainingpleurostigmophorans (fig 2 supplementary fig S1Supplementary Material online) Resampling support forChilopoda and for the clade containing LithobiomorphaScolopendromorpha and Geophilomorpha is absolute andis also high for Epimorpha and a clade composed ofCraterostigmomorpha and the remaining pleurostigmophor-ans in most analyses (fig 4)

Both the concatenated matrix overall and each taxonwithin it show evidence of substantial compositional hetero-geneity in a w2 test based on a simulated homogeneous nulldistribution (Foster 2004) (P values for all taxa were lt102)However a quartet supernetwork constructed from a morerestricted subset of genes that individually pass this compo-sitional heterogeneity test and which additionally fall into theleast saturated quartile of these stationary genes (Foster2004) demonstrates a network topology broadly comparable(but see later) with the tree topology of our concatenatedanalyses (fig 5) indicating that the compositional heteroge-neity though present is likely not driving the recoveredtopology

Furthermore although missing data are distributed in acomplex manner across the tree the number of genes poten-tially decisive for each internal node is high (eg gt800 for allnodes in Chilopoda in the 1934 gene analysis) indicating thatthe high support recovered in analyses of our largest matricesis not likely an artifact of the pattern of missing data(DellrsquoAmpio et al 2014 see also supplementary fig S1Supplementary Material online)

Our individual maximum likelihood (ML) gene-tree anal-yses display a complex pattern of topological discordance andmissing data Using quartet supernetworks we visualized thepredominant splits within these gene trees both for the com-plete set considered in our largest supermatrix and for thereduced set of compositionally homogeneous least-saturatedgenes (figs 4 and 5) Network topologies are similar in bothsets although there is a greater degree of conflict displayed

(particularly among outgroup species) in the reduced setperhaps the result of increased sampling error in this morelimited set of slowly evolving genes (Betancur-R et al 2014)Although both supernetworks display a tree-like pattern ofsplits reminiscent in many respects to the concatenated to-pologies (fig 2) they also both suggest the existence of con-siderable among-gene conflict in the relative position ofCraterostigmus with some genes suggesting a sister-grouprelationship with Scutigera in contrast to the concatenatedtopology We therefore quantified the number of genes con-gruent with either topology and also with a number of sa-lient hypotheses on centipede interrelationships (fig 6) Thesegene support frequencies largely favor the splits recovered inthe concatenated topology with for example 303 of genesgrouping Craterostigmomorpha and the remaining pleuros-tigmophorans and only 211 grouping Scutigeromorphaand Craterostigmomorpha However relationships withinthe clade formed by Lithobiomorpha and Epimorphaare somewhat more conflicted with 246 of genes sup-porting Epimorpha versus 213 supporting Lithobiomorpha + Geophilomorpha

The dated phylogenies for the 389 gene matrix undereither an autocorrelated log normal (fig 3 top) or uncorre-lated gamma model (fig 3 bottom) support a diversificationof Chilopoda between the Early Ordovician and the MiddleDevonian diversification of Pleurostigmophora in the MiddleOrdovician to Early Carboniferous and the diversification ofthe clade uniting Lithobiomorpha Scolopendromorpha andGeophilomorpha in the Silurian-Carboniferous The diversifi-cation of Epimorpha ranges from the Early Devonian to theearly Permian (fig 3)

Discussion

Phylogenomic Hypothesis of CentipedeInterrelationships

Resolving the Tree of Life has been prioritized as one of the125 most important unsolved scientific questions in 2005 byScience (Kennedy 2005) and the advent of phylogenomics hasaided in resolving many contentious aspects in animal phy-logeny (eg Philippe and Telford 2006 Dunn et al 2008Bleidorn et al 2009 Hejnol et al 2009 Meusemann et al2010 Kocot et al 2011 Rehm et al 2011 Smith et al 2011Struck et al 2011 Hartmann et al 2012 von Reumont et al2012) This is not without controversy and several phenom-ena unique to genome-scale data have been identified asnegatively impacting tree reconstruction in this paradigmperhaps foremost among these being gene occupancy (miss-ing data) (Roure et al 2013 DellrsquoAmpio et al 2014) taxonsampling (Pick et al 2010) and quality of data (eg properortholog assignment and controls on exogenous contamina-tion) (Philippe et al 2011 Salichos and Rokas 2011) In addi-tion assumptions underlying concatenation (Salichos andRokas 2013) and model (mis-)specification (Lartillot andPhilippe 2008) have also been identified as possible pitfallsfor tree reconstruction in a phylogenomic framework Finallyit has been shown that conflict may exist between classes ofgenes with some advocating exclusive usage of slowly

1503

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

evolving genes to resolve deep metazoan splits (eg Nosenkoet al 2013) although others believe that slow-evolving genesmay not be able to resolve deep splits due to the lack ofsufficient signal

We have worked carefully to try to address all these issuesin our analyses First we have minimized the amount of miss-ing data and worked with one of the most complete phylo-genomic matrices for nonmodel invertebrates based ongenomes and deeply sequenced transcriptomes with the ex-ception of the millipede outgroup transcriptome(Meusemann et al 2010) Although debate exists as towhether or not large amounts of missing data negativelyaffect phylogeny reconstruction (Hejnol et al 2009Lemmon et al 2009 DellrsquoAmpio et al 2014) Roure et al(2013) concluded that although the effects of missing datacan have a negative impact in certain situations other issueshave a more direct effect including modeling among-sitesubstitutional heterogeneity they recommended graphicallydisplaying the amount of missing data in phylogenetic treesas done for example by Hejnol et al (2009) or as presentedhere (fig 2) Given our low levels of missing data and high ma-trix completeness (supplementary table S2 Supplementary

Material online) it is unlikely that our results are affectedby gene occupancy or missing data Likewise taxon samplinghas been optimized to represent all major centipede lineagesNo major hypothesis on deep centipede phylogenetics re-mains untested with our sampling although it may be advis-able to add additional species for each order in future studiesespecially of Scutigeromorpha and Lithobiomorpha whosepositions show incomplete support or substantial among-gene conflict perhaps because they are represented by asingle species each (figs 4ndash6)

It has been argued that in the case of strong gene-treeincongruence (eg from incomplete lineage sorting horizon-tal gene transfer or cryptic duplicationloss) concatenatedanalysis may be misleading causing an erroneous species treeto be inferred and furthermore with inflated resampling sup-port giving no indication of intergene conflict (eg Jeffroyet al 2006 Kubatko and Degnan 2007 Nosenko et al 2013Salichos and Rokas 2013) Many therefore question exclusivereliance on concatenation and instead argue for the applica-tion of a model of the source of incongruence (eg the emerg-ing species tree-methods based on the multispeciescoalescent Edwards 2009) or for discarding data by selecting

CenozoicMesozoicPaleozoicPrecambrian

C O S D C P T J K P N Q

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

0100200300400500600

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

FIG 3 Chronogram of centipede evolution for the 389-gene data set with 95 highest posterior density (HPD) bar for the dating under autocorrelatedlog normal (top) and uncorrelated gamma model (bottom) Nodes that were calibrated with fossils are indicated with a star

1504

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genes with strong phylogenetic signal and demonstrated ab-sence of significant incongruence when reconstructing an-cient divergences To address this possibility we alsoadopted a gene-tree perspective inferring individual MLtrees for each gene included in our concatenated matrixHowever many available methods of investigating gene-treeincongruence (including recent metrics such as internodecertainty and its derivatives Salichos and Rokas 2013Salichos et al 2014) can only be calculated for genes thatcontain the same set of taxa an assumption that our dataset (and many similar data sets generated by highly parallelsequencing) assuredly violates

We therefore visualized the dominant bipartitions amongthese gene trees by constructing a supernetwork (a general-ization of supertrees permitting reticulation where intergeneconflicts exist) using the SuperQ method (Grunewald et al2013) which decomposes all gene trees into quartets and

then infers a supernetwork from these quartets assigningedge lengths by examining quartet frequencies We inferredsupernetworks from both ML gene trees (figs 4 and 5) andthe majority rule consensus (MRC) trees from the bootstrapreplicates of each gene (figure not shown) These supernet-works display a predominantly tree-like structure topologi-cally resembling our concatenated species trees howeverthey indicate the presence of intergene conflict in the relativepositions of Craterostigmomorpha and Scutigeromorphawith some genes uniting these taxa as a clade in contrastto our species tree (figs 4 and 5) Interestingly the sameoverarching network structure is present in both the ML-tree-based and bootstrap MRC-based networks indicatingthat these conflicts are not merely the result of stochasticsampling error To examine how these phylogenetic conflictsare distributed among our input genes we employedConclustador which automatically assigns genes into

001Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes

scapularis

Tetranychus

urticae

Daphnia pulexStrigamia

maritima

Himantarium

gabrielis

Lithobius

Cryptops

hortensis

Alipes

grandidieri

FIG 5 Supernetwork representation of quartets derived from individual ML gene trees considering only 389 genes showing evidence of compositionalhomogeneity and relatively low substitutional saturation

Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes scapularis

Tetranychus urticae

Daphnia pulex

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

001

FIG 4 Supernetwork representation of quartets derived from individual ML gene trees for all 1934 genes concatenated in the supermatrix presented infigure 2 Phylogenetic conflict is represented by nontree-like splits (eg ScutigeromorphandashCraterostigmomorpha)

1505

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 5: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

evolving genes to resolve deep metazoan splits (eg Nosenkoet al 2013) although others believe that slow-evolving genesmay not be able to resolve deep splits due to the lack ofsufficient signal

We have worked carefully to try to address all these issuesin our analyses First we have minimized the amount of miss-ing data and worked with one of the most complete phylo-genomic matrices for nonmodel invertebrates based ongenomes and deeply sequenced transcriptomes with the ex-ception of the millipede outgroup transcriptome(Meusemann et al 2010) Although debate exists as towhether or not large amounts of missing data negativelyaffect phylogeny reconstruction (Hejnol et al 2009Lemmon et al 2009 DellrsquoAmpio et al 2014) Roure et al(2013) concluded that although the effects of missing datacan have a negative impact in certain situations other issueshave a more direct effect including modeling among-sitesubstitutional heterogeneity they recommended graphicallydisplaying the amount of missing data in phylogenetic treesas done for example by Hejnol et al (2009) or as presentedhere (fig 2) Given our low levels of missing data and high ma-trix completeness (supplementary table S2 Supplementary

Material online) it is unlikely that our results are affectedby gene occupancy or missing data Likewise taxon samplinghas been optimized to represent all major centipede lineagesNo major hypothesis on deep centipede phylogenetics re-mains untested with our sampling although it may be advis-able to add additional species for each order in future studiesespecially of Scutigeromorpha and Lithobiomorpha whosepositions show incomplete support or substantial among-gene conflict perhaps because they are represented by asingle species each (figs 4ndash6)

It has been argued that in the case of strong gene-treeincongruence (eg from incomplete lineage sorting horizon-tal gene transfer or cryptic duplicationloss) concatenatedanalysis may be misleading causing an erroneous species treeto be inferred and furthermore with inflated resampling sup-port giving no indication of intergene conflict (eg Jeffroyet al 2006 Kubatko and Degnan 2007 Nosenko et al 2013Salichos and Rokas 2013) Many therefore question exclusivereliance on concatenation and instead argue for the applica-tion of a model of the source of incongruence (eg the emerg-ing species tree-methods based on the multispeciescoalescent Edwards 2009) or for discarding data by selecting

CenozoicMesozoicPaleozoicPrecambrian

C O S D C P T J K P N Q

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

0100200300400500600

Peripatopsis capensis

Tetranychus urticae

Ixodes scapularis

Daphnia pulex

Archispirostreptus gigas

Scutigera coeloptrata

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

FIG 3 Chronogram of centipede evolution for the 389-gene data set with 95 highest posterior density (HPD) bar for the dating under autocorrelatedlog normal (top) and uncorrelated gamma model (bottom) Nodes that were calibrated with fossils are indicated with a star

1504

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

genes with strong phylogenetic signal and demonstrated ab-sence of significant incongruence when reconstructing an-cient divergences To address this possibility we alsoadopted a gene-tree perspective inferring individual MLtrees for each gene included in our concatenated matrixHowever many available methods of investigating gene-treeincongruence (including recent metrics such as internodecertainty and its derivatives Salichos and Rokas 2013Salichos et al 2014) can only be calculated for genes thatcontain the same set of taxa an assumption that our dataset (and many similar data sets generated by highly parallelsequencing) assuredly violates

We therefore visualized the dominant bipartitions amongthese gene trees by constructing a supernetwork (a general-ization of supertrees permitting reticulation where intergeneconflicts exist) using the SuperQ method (Grunewald et al2013) which decomposes all gene trees into quartets and

then infers a supernetwork from these quartets assigningedge lengths by examining quartet frequencies We inferredsupernetworks from both ML gene trees (figs 4 and 5) andthe majority rule consensus (MRC) trees from the bootstrapreplicates of each gene (figure not shown) These supernet-works display a predominantly tree-like structure topologi-cally resembling our concatenated species trees howeverthey indicate the presence of intergene conflict in the relativepositions of Craterostigmomorpha and Scutigeromorphawith some genes uniting these taxa as a clade in contrastto our species tree (figs 4 and 5) Interestingly the sameoverarching network structure is present in both the ML-tree-based and bootstrap MRC-based networks indicatingthat these conflicts are not merely the result of stochasticsampling error To examine how these phylogenetic conflictsare distributed among our input genes we employedConclustador which automatically assigns genes into

001Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes

scapularis

Tetranychus

urticae

Daphnia pulexStrigamia

maritima

Himantarium

gabrielis

Lithobius

Cryptops

hortensis

Alipes

grandidieri

FIG 5 Supernetwork representation of quartets derived from individual ML gene trees considering only 389 genes showing evidence of compositionalhomogeneity and relatively low substitutional saturation

Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes scapularis

Tetranychus urticae

Daphnia pulex

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

001

FIG 4 Supernetwork representation of quartets derived from individual ML gene trees for all 1934 genes concatenated in the supermatrix presented infigure 2 Phylogenetic conflict is represented by nontree-like splits (eg ScutigeromorphandashCraterostigmomorpha)

1505

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 6: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

genes with strong phylogenetic signal and demonstrated ab-sence of significant incongruence when reconstructing an-cient divergences To address this possibility we alsoadopted a gene-tree perspective inferring individual MLtrees for each gene included in our concatenated matrixHowever many available methods of investigating gene-treeincongruence (including recent metrics such as internodecertainty and its derivatives Salichos and Rokas 2013Salichos et al 2014) can only be calculated for genes thatcontain the same set of taxa an assumption that our dataset (and many similar data sets generated by highly parallelsequencing) assuredly violates

We therefore visualized the dominant bipartitions amongthese gene trees by constructing a supernetwork (a general-ization of supertrees permitting reticulation where intergeneconflicts exist) using the SuperQ method (Grunewald et al2013) which decomposes all gene trees into quartets and

then infers a supernetwork from these quartets assigningedge lengths by examining quartet frequencies We inferredsupernetworks from both ML gene trees (figs 4 and 5) andthe majority rule consensus (MRC) trees from the bootstrapreplicates of each gene (figure not shown) These supernet-works display a predominantly tree-like structure topologi-cally resembling our concatenated species trees howeverthey indicate the presence of intergene conflict in the relativepositions of Craterostigmomorpha and Scutigeromorphawith some genes uniting these taxa as a clade in contrastto our species tree (figs 4 and 5) Interestingly the sameoverarching network structure is present in both the ML-tree-based and bootstrap MRC-based networks indicatingthat these conflicts are not merely the result of stochasticsampling error To examine how these phylogenetic conflictsare distributed among our input genes we employedConclustador which automatically assigns genes into

001Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes

scapularis

Tetranychus

urticae

Daphnia pulexStrigamia

maritima

Himantarium

gabrielis

Lithobius

Cryptops

hortensis

Alipes

grandidieri

FIG 5 Supernetwork representation of quartets derived from individual ML gene trees considering only 389 genes showing evidence of compositionalhomogeneity and relatively low substitutional saturation

Craterostigmus

tasmanianus

Scutigera

coleoptrata

Archispirostreptus gigas

Peripatopsis capensis

Ixodes scapularis

Tetranychus urticae

Daphnia pulex

Strigamia maritima

Himantarium gabrielis

Cryptops hortensis

Alipes grandidieri

001

FIG 4 Supernetwork representation of quartets derived from individual ML gene trees for all 1934 genes concatenated in the supermatrix presented infigure 2 Phylogenetic conflict is represented by nontree-like splits (eg ScutigeromorphandashCraterostigmomorpha)

1505

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 7: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

exclusive clusters if they display significant (and well sup-ported) areas of local topological incongruence by examiningbipartition distances among and within pseudoreplicate treesfor each gene (Leigh et al 2011) Intriguingly this method didnot find evidence for any well-supported local incongruencebetween genes in our data set (ie only one cluster containingall genes was selected) Although at first glance this result is atodds with the conflicts observed in our supernetwork analy-ses this seeming paradox could be reconciled if most or allgenes display the same patterns of topological conflict withintheir bootstrap replicates

To address gene-tree conflict and the distribution of miss-ing data in a more quantitative manner we counted thepotentially decisive genes and the frequency of genes withinthis set that are congruent with both the nodes in our con-catenated topology and with several alternative hypotheses(fig 6) Large numbers of genes (from 1115 to 1742 fig 6) areavailable to test interordinal relationships within ChilopodaThe gene support frequencies within these potentially deci-sive sets also clearly favor the nodes within our concatenatedtopology over competing hypotheses although we note thatthe potentially decisive gene sets are not directly comparableacross nodes as they contain different populations of genesdepending on the particular taxonomic distribution of miss-ing data We also emphasize that although the gene supportfrequencies appear relatively low (ie usually lt50) even fornodes in our concatenated topology these frequencies werederived from counts of fully bifurcating ML trees many ofwhich contain splits that are poorly supported due to limited-length alignments andor inappropriate rates of evolution forthe splits in question we hypothesize that these gene supportfrequencies would appear even more decisively in favor of the

concatenated topology if considering only well-supportedsplits

The LG4X + F and CAT + GTR models we employed areboth able to model site-heterogeneous patterns of molecularevolution to a degree and are therefore better able to detectmultiple substitutions an important issue for this data set asthe gene selection procedure we employed was agnostic tothe rate of molecular evolution (Lartillot and Philippe 2004Philippe et al 2011) Both however also assume stationarityof amino acid frequencies through time an assumption re-jected by a sensitive statistical test for the 1934 gene super-matrix To discern whether substitutional saturation andcompositional heterogeneity were influencing our phyloge-netic results we considered a subset of 389 genes that pass in-dividual tests of compositional heterogeneity and whichfurthermore fall in the least saturated quartile of this station-ary set A quartet supernetwork built from ML treesdrawn only within this subset of genes (fig 5) resemblesthe ingroup tree topology inferred from our concatenatedanalyses (fig 2) in supporting the monophyly ofLithobiomorpha + Epimorpha In the 389-gene quartetsupernetwork however we observe essentially the samegene-tree conflict as in the 1934 gene quartet supernetworkthat is disagreement on the relative position ofCraterostigmus with respect to Scutigeromorpha and theclade comprising the remaining pleurostigmophoran ordersThis conflict could be attributable in part to exacerbation ofgene-tree incongruence reported for slowly evolving parti-tions in analyses of genomic data sets due to the confluenteffect on reducing real internode length and the theoreticalexpectation of greater incongruence for short internodes(Salichos and Rokas 2013 but see Betancur-R et al 2014)

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

250

1173

551

1684

213

327

Geophilomorpha

Lithobiomorpha

Scolopendromorpha

200

1115

570

1742

179

327

Craterostigmus tasmanianus

Strigamia maritima

Himantarium gabrielis

Alipes grandidieri

Cryptops hortensis

Scutigera coleoptrata

Archispirostreptus gigas

Daphnia pulex

Ixodes scapularis

Tetranychus urticae

Peripatopsis capensis

593503

848

670

876765

394

1603246

503

876

246

670

394

765

593

380

1254303

413

1365

303

712

1234

577

42

119

353

139

471295

214

1371156

Amalpighiata

Craterostigmomorpha714

1195598

376

1603235

Scutigeromorpha

Amalpighiata

229

1090211

918

1536598

229

1090211

598

Scutigeromorpha

Craterostigmomorpha

388

1207206

1412146

322

206

322

Epimorpha

Lithobiomorpha

Craterostigmomorpha

FIG 6 Counts of potentially decisive genes and (within this set) counts of actually congruent genes computed for each node in the concatenatedtopology (left) as well as for select alternative hypotheses of centipede interrelationships (right) Potentially decisive genes printed below nodecongruent genes in this set printed above node and gene support percentage printed to side

1506

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 8: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

Finally our amino acid level transcriptome-based phylog-eny is highly similar to previous hypotheses on centipedephylogenetics using much smaller sets of genes (Edgecombeet al 1999 Giribet et al 1999 Edgecombe and Giribet 2004Mallatt and Giribet 2006 Murienne et al 2010) and findsstrong support for such accepted groups as Pleurostigmo-phora Epimorpha Scolopendromorpha and Geophilomorpha thus supporting the idea that our morphologicallysurprising result for Lithobiomorpha + Epimorpha is notartifactual

Evolutionary Implications

Two salient evolutionary implications of this study are 1) thebasal position of Craterostigmomorpha with respect toLithobiomorpha a result anticipated in prior phylogeneticanalyses based on molecular data (Mallatt and Giribet 2006Regier et al 2010) and 2) the superior reconciliation ofthe new topology with the fossil record of centipedes(Edgecombe 2011b) thus improving upon prior dating studies(Murienne et al 2010) The former has consequences in ourinterpretation of an important number of morphologicaltraits thought to be shared by the clade Phylactometria(see Implications for Morphological and BehavioralCharacters section) and requires a taxonomic proposal for aclade including Lithobiomorpha Scolopendromorpha andGeophilomorpha We name this group Amalpighiata onthe basis of the three constituent orders lacking the supernu-merary Malpighian tubules found in Scutigeromorpha andCraterostigmomorpha Amalpighiata appears well supportedand shows no strong conflict among the analyzed genesAmong-gene conflict and also relatively lower nodalsupport in the concatenated analysis exists for the relativepositions of Craterostigmus and Scutigera although the mono-phyly of Pleurostigmophora (Amalpighiata + Craterostigmo-morpha) receives support in several analyses

The second implication of the Amalpighiata hypothesisaccounts for one of the inconsistencies in the centipedefossil record under the Phylactometria hypothesis Stem-group Lithobiomorpha are predicted to have originated byat least the Middle Devonian (under either of the two com-peting hypotheses for the relationships of the DevonianDevonobius) (eg Murienne et al 2010) yet no Paleozoic oreven Mesozoic fossils are known for Lithobiomorpha Theirfossil record does not extend back further than the Eocene(when they are represented by a few species of Lithobiidae inBaltic amber) This is no doubt a gross underestimate of theirreal age even the alternative Amalpighiata tree dates the splitbetween Lithobiomorpha and Epimorpha to at least theUpper Carboniferous (constrained by the earliest knownEpimorpha Scolopendromorpha) (fig 3) Still the Siluro-Devonian fossil record of Chilopoda consists only ofScutigeromorpha and the extinct Devonobiomorpha whichare allied to either Craterostigmomorpha (Borucki 1996) orEpimorpha (Shear and Bonamo 1988 Murienne et al 2010)The former hypothesis is consistent with an earlier divergenceof Craterostigmomorpha than Lithobiomorpha and could

thus account for the lack of mid-Paleozoic fossillithobiomorphs

That said the scattered records of Chilopoda in Mesozoicshales lithographic limestones and ambers which includeexemplars of Scutigeromorpha Scolopendromorpha andGeophilomorpha but not Lithobiomorpha (reviewed byEdgecombe 2011b) suggests that lithobiomorphs are under-represented in the fossil record as a whole and their apparentabsence in the Paleozoic is likely a taphonomic artifact TheAmalpighiata tree dates the origins of centipede orders withbetter fit to their first known fossil occurrences and diver-gence dates are concordant with the fossil record perhapsindicating that the Paleozoic fossil record of centipedes is notas incomplete as previously thought An additional interestingpoint of our data set is that the diversification of myriapods isSilurian-Devonian for the analysis under the autocorrelatedlog normal model and only our uncorrelated gamma modelsupports a Cambrian terrestrialization event as suggestedrecently for this lineage (Rota-Stabelli et al 2013)Additional transcriptomes should allow for a better estima-tion of the diversification dates of each order but should notaffect their origin Should additional taxon sampling continueto yield substantial among-gene incongruence and a succes-sion of interordinal divergences closely spaced together intime consistent with the scenario of an rapid radiation wenote that biological (as opposed to methodologicalartifac-tual) explanations for the incongruence such as incompletelineage sorting may need to be considered

Implications for Morphological andBehavioral Characters

A substantial list of characters has been tabled in defense of asister-group relationship between Craterostigmus andEpimorpha They include the following putative synapomor-phies (Edgecombe 2011a)

1) Brooding involving the mother guarding the egg clusterby wrapping her body around it

2) Extraordinarily high cell numbers in the lateral ocelli(eg more than 1000 retinula cells)

3) Proximal retinula cells partly with monodirectionalrhabdomeres

4) Maxillary nephridia lacking in postembryonic stadia5) Rigidity of the forcipules a character complex including

the forcipular pleurite arching over the coxosternite thehinge of the coxosternite being sclerotized and inflexi-ble the coxosternite deeply embedded into the cuticleabove the first pedigerous trunk segment and a trun-cation of sternal muscles in the forcipular segmentrather than extending into the head

6) Presternites distinct7) Separate sternal and lateral longitudinal muscles8) Lateral testicular vesicles linked by a central posteriorly

extended deferens duct9) Coxal organs confined to the last leg pair

10) Internal valves formed by lips of the ostia projectingdeeply into the heart lumen

1507

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 9: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

Under the alternative Amalpighiata hypothesis in whichLithobiomorpha rather than Craterostigmus is sister group ofEpimorpha these characters are implied to be homoplasticThey are either general characters of Pleurostigmophora as awhole that have been secondarily modified (lost subject toreversal or otherwise transformed) in Lithobiomorpha orthey have been convergently acquired by Craterostigmusand Epimorpha

Craterostigmus has a single anamorphic stage in its lifecycle it hatches from the egg as a 12-legged instar and ac-quires the adult number of 15 leg-pairsmdashthe plesiomorphicnumber of legs in centipedesmdashin the succeeding instar Thiscontrasts with numerous anamorphic stages inScutigeromorpha and Lithobiomorpha which hatch with 4or 6ndash8 leg pairs respectively and keep adding segments untilreaching the definitive 15 leg pairs The Phylactometria hy-pothesis has generally viewed the ldquoreduced hemi-anamorphosisrdquo (Borucki 1996) of Craterostigmus as atransitional stage in the evolution of complete epimorphosisin Epimorpha The alternative hypothesis in whichCraterostigmus is sister group of all otherPleurostigmophora rejects this interpretation in favor oftwo independent reductions of anamorphic instars (or sec-ondary reacquisition of them in Lithobiomorpha) Aproposposterior segmentation in arthropods has been shown to beevolutionarily labile as demonstrated by the extreme case ofposterior segment reduction during mite embryogenesis withconcomitant loss of certain gene expression domains orentire genes altogether in the genome of the miteTetranychus urticae (Grbic et al 2011)

At present few characters have been identified that couldbe reinterpreted as shared derived characters ofLithobiomorpha and Epimorpha One noteworthy exampleis the absence of supernumerary Malpighian tubules InChilopoda these occur only in Scutigeromorpha and inCraterostigmus and their presence in these two ordersalone has been regarded as a plesiomorphic feature(Prunescu and Prunescu 1996 Prunescu and Prunescu2006) Their inferred loss in Lithobiomorpha and Epimorphawould imply a single step under the favored topology in thisstudy and gives the name to the clade Amalpighiata

Other potential apomorphies of Lithobiomorpha andEpimorpha involve characters that are shared byLithobiomorpha and Scolopendromorpha alone and arethus required to be lost or otherwise modified inGeophilomorpha

For example Craterostigmus shares three multicuspedteeth in the mandible with Scutigeromorpha in contrast tofour and five such teeth on opposing mandibles inLithobiomorpha and Scolopendromorpha Geophilomorphaprimitively lack teeth (Koch and Edgecombe 2012) so if atransformation from three teeth (common ancestor ofChilopoda) to fourfive teeth (common ancestorof Lithobiomorpha + Epimorpha) were envisioned the lossof mandibular teeth in Geophilomorpha is a necessary part ofthe transformation series A more homoplastic character thatcould be postulated in support of Amalpighiata as a clade isthe presence of marginated tergites These are lacking in

Craterostigmus are ubiquitous in Lithobiomorpha and arevariably present in Scolopendromorpha (present inScolopocryptopinae and Scolopendridae) Their absence inGeophilomorpha and several blind scolopendromorphclades (Newportiinae Cryptopidae and Plutoniumidae)however means that their status as a character forAmalpighiata would demand a few instances of reversalloss As such though it must be conceded that this schemeintroduces considerable homoplasy from the perspective ofmorphology a few characters can be suggested as potentialsynapomorphies for Amalpighiata

Materials and Methods

Sample Collection

We sampled six specimens representing the main groups ofcentipedes (MCZ voucher numbers in brackets) Scutigeracoleoptrata (IZ-20415) (Scutigeromorpha) Lithobius forficatus(IZ-131534) (Lithobiomorpha) Craterostigmus tasmanianus(IZ-128299) (Craterostigmomorpha) Alipes grandidieri (IZ-130616) Cryptops hortensis (IZ-130583) (Scolopen-dromorpha) and Himantarium gabrielis (IZ-131564) (Geophi-lomorpha) Information about the sampling localitiescollection details can be found in the MCZ online collectionsdatabase (httpmczbasemczharvardedu last accessedApril 1 2014) The following taxa were sampled as outgroupsPeripatopsis capensis (Onychophora a transcriptome) andIxodes scapularis and T urticae (Arthropoda Arachnida andAcari complete genomes) Samples were preserved in RNA-later (Ambion) or flash frozen in liquid nitrogen immediatelyafter collection A portion of the central part of each animal(including the body and legs) or the anterior part (in S coleop-trata including the head and a part of the trunk) was dissectedand stored at80 C until RNA extraction Three more taxaretrieved from external sources were also included Thegenome of Strigamia maritima (Geophilomorpha) was down-loaded from httpmetazoaensemblorgStrigamia_mari-timaInfoIndex (last accessed April 1 2014) The genome ofDaphnia pulex was retrieved from httpmetazoaensemblorgDaphnia_pulexInfoIndex (last accessed April 1 2014)The transcriptome of A gigas generated by 454 pyrosequen-cing was retrieved from Meusemann et al (2010)

mRNA Extraction

Total RNA was extracted with a standard trizol-basedmethod using TRIzol (Life Sciences) following the manufac-turerrsquos protocol Tissue fragments were disrupted with a drillin 500ml of TRIzol using an RNAse-free plastic pestle forgrinding TRIzol was added up to a total volume of 1 mlAfter 5 min incubating at room temperature (RT) 100 ml ofbromochloropropane was mixed by vortexing After incuba-tion at RT for 10 min the samples were centrifuged at16000 rpm for 15 min at 4 C Afterward the upper aqueouslayer was recovered mixed with 500 ml of isopropanol andincubated at 20 C overnight Total RNA precipitation wasmade as follows samples were centrifuged for 15 min at16000 rpm at 4 C the pellet was washed twice with 1 mlof 75 isopropanol and centrifuged at 16000 rpm at 4 C for

1508

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 10: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

15 and 5 min in the first and second washing step respec-tively air dried and eluted in 100ml of RNA Storage solution(Ambion) mRNA purification was done with the DynabeadsmRNA Purification Kit (Invitrogen) following manufacturerrsquosinstructions After incubation of total RNA at 65 C for 5 minthe samples were incubated for 30 min with 200ml of mag-netic beads in a rocker and washed twice with washing bufferThis step was repeated again with an incubation time of5 min mRNA was eluted in 15ml of Tris-HCl buffer Qualityof mRNA was measured with a pico RNA assay in an Agilent2100 Bioanalyzer (Agilent Technologies) Quantity was mea-sured with a RNA assay in Qubit fluorometer (LifeTechnologies)

cDNA Library Construction and Next-GenerationSequencing

cDNA libraries were constructed with the TruSeq for RNASample Preparation kit (Illumina) following the manufac-turerrsquos instructions Libraries for C tasmanianus and S coleop-trata were constructed in the Apollo 324 automated systemusing the PrepX mRNA kit (IntegenX) The samples weremarked with a different index to allow pooling for sequencingConcentration of the cDNA libraries was measured through adsDNA High Sensitivity (HS) assay in a Qubit fluorometer(Invitrogen) Library quality and size selection were checkedin an Agilent 2100 Bioanalyzer (Agilent Technologies) withthe HS DNA assay The samples were run using the IlluminaHiSeq 2500 platform with paired-end reads of 150 bp at theFAS Center for Systems Biology at Harvard University

Raw Data Sanitation and Sequence Assembly

Demultiplexed Illumina HiSeq 2500 sequencing results inFASTQ format were retrieved from the sequencing facilityvia FTP The files were decompressed and imported into CLCGenomics Workbench 551 retaining read header informa-tion as paired read files Each sample was quality filteredaccording to a threshold average quality score of 30 basedon a Phred scale and adaptor trimmed using a list of knownadaptors provided by Illumina All reads shorter than 25 bpwere discarded and the resulting files were exported as asingle FASTQ file

To simplify the assembly process ribosomal RNA (rRNA)was filtered out All known metazoan rRNA sequences weredownloaded from GenBank and formatted into bowtie indexusing ldquobowtie-buildrdquo Each sample was sequentially aligned to

the index allowing up to two mismatches via Bowtie 100(Langmead et al 2009) retaining all unaligned reads in FASTQformat Unaligned results stored as a single file were im-ported into Geneious 616 (Kearse et al 2012) paired usingread header information and exported as separate files for leftand right mate pairs This process was repeated for eachsample

De novo assemblies (strand specific in C tasmanianus andS coleoptrata) were done individually per organism in Trinity(Grabherr et al 2011 Haas et al 2013) using paired read filesand default parameters Raw reads and assembled sequenceshave been deposited in the National Center for BiotechnologyInformation Sequence Read Archive and TranscriptomeShotgun Assembly databases respectively (table 2)

Identification of Candidate Coding Regions

Redundancy reduction was done with CD-HIT-EST (Fu et al2012) in the raw assemblies (95 global similarity) Resultingassemblies were processed in TransDecoder (Haas et al 2013)to identify candidate ORFs within the transcripts Predictedpeptides were then processed with a further filter to selectonly one peptide per putative unigene by choosing the lon-gest ORF per Trinity subcomponent with a Python scriptthus removing the variation in the coding regions of our as-semblies due to by alternative splicing closely related para-logs and allelic diversity Peptide sequences with all finalcandidate ORFs were retained as multifasta files Sequencedata for Strigamia maritima and D pulex were obtained fromgenome assemblies as opposed to the de novo assembledtranscriptomes Annotated predicted peptide sequences forD pulex were downloaded from ldquowFleaBaserdquo and sequenceswere clustered using CD-HIT at a similarity of 98 for redun-dancy reduction Peptide sequences for Strigamia maritimawere downloaded from Ensembl and also clustered at 98similarity Results for both D pulex and Strigamia maritimawere converted to a single line multifasta file

Orthology Assignment

We assigned predicted ORFs into orthologous groups acrossall samples using OMA stand-alone v099t (Altenhoff et al2011 Altenhoff et al 2013) The advantages of the algorithmof OMA over standard bidirectional best-hit approaches relyon the use of evolutionary distances instead of scores con-sideration of distance inference uncertainty and differentialgene losses and inclusion of many-to-many orthologous re-lations The ortholog matrix is constructed from all-against-all

Table 2 Accession Numbers for the Raw Transcriptomes and Assemblies from the Taxa Newly Sequenced in This Study

BioProject BioSample Experiment Run

Scutigera coleoptrata PRJNA237135 SAMN02614634 SRX462011 SRR1158078

Craterostigmus tasmanianus PRJNA237134 SAMN02614633 SRX461877 SRR1157986

Lithobius forficatus PRJNA237133 SAMN02614632 SRX462145 SRR1159752

Himantarium gabrielis PRJNA237131 SAMN02614631 SRX461787 SRR1159787

Alipes grandidieri PRJNA179374 SAMN01816532 SRX205685 SRR619311

Cryptops hortensis PRJNA237130 SAMN02614630 SRX457664 SRR1153457

Peripatopsis capensis PRJNA236598 SAMN02598680 SRX451023 SRR1145776

1509

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 11: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

Smith-Waterman protein alignments The program identifiesthe so called ldquostable pairsrdquo verifies them and checks againstpotential paralogous genes In a last step cliques of stablepairs are clustered as groups of orthologs All input fileswere single-line multifasta files and the parametersdrw filespecified retained all default settings with the exception ofldquoNPrdquo which was set at 300 We parallelized all-by-all localalignments across 256 cores of a single compute node imple-menting a custom Bash script allowing for execution of inde-pendent threads with at least 3 s between each instance ofOMA to minimize risk of collisions

Phylogenomic Analyses and Congruence Assessment

We constructed three different amino acid supermatricesFirst a large matrix was constructed by concatenating theset of OMA groups containing six or more taxa To increasegene occupancy and to reduce the percentage of missingdata a second matrix was created by selecting the orthologscontained in 11 or more taxa Because of the high percentageof missing data in A gigas this taxon was eliminated from thissupermatrix This supermatrix reconstruction approach wasselected over other available software (eg MARE) because itguarantees a minimum taxon occupancy per orthogrouptherefore resulting in a supermatrix with a good compromisebetween gene and taxon occupancy and missing data

The selected orthogroups (1934 and 61 for both matrixesrespectively) were aligned individually using MUSCLE version36 (Edgar 2004) To increase the signal-to-noise ratio andimprove the discriminatory power of phylogenetic methodswe applied a probabilistic character masking with ZORRO(Wu et al 2012) to account for alignment uncertainty runusing default parameters (and using FastTree 214 [Price et al2010] to produce guide trees) This program first calculatesthe probability of all alignments that pass through a specifiedmatched pair of residues using a pair hidden Markov modelframework and it then compares this value with the fullprobability of all alignments of the pair of sequences Theposterior probability value indicates how reliable the matchis (ie highly reliable if it is close to 1 ambiguous if it is close to0) It then sums up the probability that a particular columnwould appear over the alignment landscape and assigns aconfidence score between 0 and 10 to each column provid-ing an objective measure that has an explicit evolutionarymodel and is mathematically rigorous (Wu et al 2012) Wediscarded the positions assigned a confidence score below athreshold of 5 with a custom Python script prior to concat-enation (using Phyutility 26 Smith and Dunn 2008) and sub-sequent phylogenomic analyses

To discern if substitutional saturation and among-taxoncompositional heterogeneity were affecting our phylogeneticresults we considered a third supermatrix Using the p4Python library (Foster 2004) we performed a test of compo-sitional homogeneity using w2 statistics simulating a null dis-tribution using the TreecompoTestUsingSimulations()method (nSims = 100) We used the best-scoring ML treefrom our RAxML analyses as the phylogram on which to sim-ulate data using optimized parameters of a homogeneous

unpartitioned LG + I + 4 model We also conducted com-positional homogeneity tests on individual ortholog align-ments modeling a homogeneous null distribution on theML tree for each gene using the best-fitting model availablein p4 for that gene We characterized genewise substitutionalsaturation by regressing uncorrected amino acid distances onthe patristic distance implied by each ML tree in a p4 scriptand taking the slope of this linear regression as a measure ofsubstitutional saturation (Jeffroy et al 2006) We calculatedthe third quartile of the slope from among the set of com-positionally homogeneous genes with a slope between 0 and1 and then selected the set of ML trees from genes with aslope above this value (0465) for inclusion in a quartet super-network as described earlier The 389 genes that passed in-dividual tests of compositional heterogeneity and whichfurthermore fell in the least saturated quartile of this station-ary set were concatenated into a third supermatrix

ML inference was conducted with PhyML-PCMA (Zollerand Schneider 2013) and RAxML 775 (Berger et al 2011)Although RAxML uses a predetermined empirical model ofamino acid substitution PhyML-PCMA estimates a modelthrough the use of a principal component (PC) analysisThe obtained PCs describe the substitution rates that covarythe most among different protein families and thereforedefine a semiempirically determined parameterization foran amino acid substitution model specific to each data set(Zoller and Schneider 2013) We selected 20 PCs in thePhyML-PCMA analyses and empirical amino acid frequenciesPROTGAMMALG4XF was selected as the best model ofamino acid substitution for the unpartitioned RAxML analy-ses using a modification of the ProteinModelSelection perlwrapper to RAxML produced by A Stamatakis (httpscoh-itsorgexelixissoftwarehtml last accessed April 1 2014)Bootstrap values were estimated with 1000 replicates underthe rapid bootstrapping algorithm

Bayesian analysis was conducted with PhyloBayes MPI 14e(Lartillot et al 2013) using the site-heterogeneous CAT-GTRmodel of evolution (Lartillot and Philippe 2004) Three inde-pendent Markov chain Monte Carlo (MCMC) chains wererun for 5818ndash7471 cycles The initial 2000 trees (27ndash34)sampled in each MCMC run prior to convergence (judgedwhen maximum bipartition discrepancies across chainslt 01) were discarded as the burn-in period A 50 major-ity-rule consensus tree was then computed from the remain-ing 1537 trees (sampled every 10 cycles) combined from thethree independent runs

To investigate potential incongruence between individualgene trees we also inferred gene trees for each OMA groupincluded in our supermatrix For each aligned ZORRO-masked OMA group we selected a best-fitting model usinga ProteinModelSelection script modified to permit testing ofthe recent LG4M( + F) and LG4X( + F) models (Le et al 2012)Best-scoring ML trees were inferred for each gene under theselected model (with the gamma model of rate variation butno invariant term) from 20 replicates of parsimony startingtrees One hundred traditional (nonrapid) bootstrap repli-cates for each gene were also inferred A majority rule con-sensus for each gene was constructed using the SumTrees

1510

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 12: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

script from the DendroPy Python library (Sukumaran andHolder 2010) To visualize predominant intergene conflictsfor both ML and bootstrap MRC trees we employedSuperQ v11 (Grunewald et al 2013) selecting the ldquobalancedrdquoedge-weight optimization function and applying no filterSplitsTree v4131 (Huson and Bryant 2006) was then usedto visualize the resulting NEXUS files To investigate the dis-tribution of strongly supported conflict among our genetrees we employed Conclustador v04a (Leigh et al 2011)presently the only automated incongruence-detection algo-rithm readily applicable to data sets of this size We appliedthe spectral clustering algorithm on the bootstrap replicatesfor each OMA group limiting the maximum number of clus-ters to 15

To quantify the number of genes potentially decisive for agiven split and also the number within this set that actuallysupport the split in question (fig 6) we employed a customPython script to parse the set of individual gene trees If a treecontained at least one species in either descendant group fora given node plus at least two distinct species basal to thenode in question it was considered potentially decisive(forming minimally a quartet DellrsquoAmpio et al 2014) if thedescendants were monophyletic with respect to their relativeoutgroups that gene tree was considered congruent with thenode in question (although this count is agnostic to thetopology within the node in question) To quantify the rela-tive support for different hypotheses within our individualgene-tree analysis we displayed the above counts for allnodes in the 1934 gene ML topology (fig 6) and also forselect alternative scenarios of centipede interrelationshipsboth including historical hypotheses (see Introduction)and alternative topologies for Scutigeromorpha andCraterostigmomorpha suggested by our quartet supernet-works All Python custom scripts can be downloaded fromhttpsgithubcomclaumer (last accessed April 1 2014)

Fossil Calibrations

A few Paleozoic fossil occurrences constrain the timing ofdivergences between chilopod orders (for a review of themyriapod fossil record see Shear and Edgecombe 2010) TheSilurondashDevonian genus Crussolum (Shear et al 1998) is iden-tified as a stem-group scutigeromorph displaying derivedcharacters of total-group Scutigeromorpha but lackingsome apomorphies that are shared by all members of thescutigeromorph crown group (Edgecombe 2011b) The bestpreserved fossils of Crussolum are from the Early Devonian(Pragian) and Middle Devonian (Givetian) (Shear et al 1998Anderson and Trewin 2003) but the oldest confidently as-signed record is in the Late Silurian (early Pridolı) LudlowBone Bed in western England (at least 4192 My dating theend of the Pridolı) This dating constrains the split ofScutigeromorpha from Pleurostigmophora

The occurrence of Devonobius delta in the MiddleDevonian of New York at least 3827 My (date for the endof the Givetian) constrains the divergence betweenCraterostigmomorpha and the remaining pleurostigmophor-ans This minimum applies irrespective of whether

Devonobius is sister group of Craterostigmomorpha or ofEpimorpha which were equally parsimonious in previousanalyses (Edgecombe and Giribet 2004) The basal divergencein Epimorpha that is the split between Scolopendromorphaand Geophilomorpha is constrained by the oldest scolopen-dromorph Mazoscolopendra richardsoni in UpperCarboniferous (Westphalian D) deposits of Mazon Creek IL(at least 306 Ma) Available character data forMazoscolopendra are insufficient to establish whether it is astem- or crown-group scolopendromorph and it only pro-vides a minimum age for total-group Scolopendromorpha

Mid-Silurian body fossils of Diplopoda provide constraintson the divergence of Chilopoda and Diplopoda The diver-gence between these two groups is dated by the occurrenceof Archipolypoda (stem-group helminthomorphs) in the lateWenlock or early Ludlow (Wilson and Anderson 2004) Basedon these data (ages based on spores) the split betweenChilopoda and Diplopoda is at least as old as the end ofthe Gorstian Stage (early Ludlow) 4256 My

The split between Onychophora and Arthropoda wasdated between 528 My (the minimum age for Arthropodaused by Lee et al [2013] on the basis of the earliestRusophycus traces) and 558 My used as the root ofPanarthropoda (Lee et al 2013)

Divergence Time Estimates

Divergence dates were estimated using the Bayesian relaxedmolecular clock approach as implemented in PhyloBayesv33f (Lartillot et al 2013) To compare the performance ofthe autocorrelated log normal and uncorrelated gammamodels we estimated the divergence dates under bothmodels in the 389-gene data set A series of calibrationconstraints specified above in ldquoFossil calibrationsrdquo (see alsoMurienne et al 2010 Edgecombe 2011b) were used with softbounds (Yang and Rannala 2006) under a birthndashdeath priorin PhyloBayes this strategy having been found to provide thebest compromise for dating estimates (Inoue et al 2010) Twoindependent MCMC chains were run for 5000ndash6000 cyclessampling posterior rates and dates every 10 cycles The initial20 of the cycles were discarded as burn-in Posteriorestimates of divergence dates were then computed fromapproximately the last 4000 samples of each chain

Supplementary MaterialSupplementary figure S1 and tables S1 and S2 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Mark Harvey kindly provided the sample of Craterostigmustasmanianus and Savel Daniels the sample of Peripatopsiscapensis The Research Computing group at the HarvardFaculty of Arts and Sciences was instrumental in facilitatingthe computational resources required Sarah Bastkowski iskindly thanked for her advice on the usage of SuperQ v11This work was supported by internal funds from the Museumof Comparative Zoology and by a postdoctoral fellowship

1511

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 13: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

from the Fundacion Ramon Areces to RF Five anonymousreviewers and the editor provided comments that helped toimprove this article

ReferencesAltenhoff AM Gil M Gonnet GH Dessimoz C 2013 Inferring hierar-

chical orthologous groups from orthologous gene pairs PLoS One 8e53786

Altenhoff AM Schneider A Gonnet GH Dessimoz C 2011 OMA 2011orthology inference among 1000 complete genomes Nucleic AcidsRes 39D289ndashD294

Anderson LI Trewin NH 2003 An Early Devonian arthropod fauna fromthe Windyfield Cherts Aberdeenshire Scotland Palaeontology 46457ndash509

Ax P 2000 Multicellular animals Volume II The phylogenetic system ofthe Metazoa Berlin (Germany) Springer

Berger SA Krompass D Stamatakis A 2011 Performance accuracy andWeb server for evolutionary placement of short sequence readsunder maximum likelihood Syst Biol 60291ndash302

Betancur-R R Naylor G Ortı G 2014 Conserved genes sampling errorand phylogenomic inference Syst Biol 63257ndash262

Bleidorn C Podsiadlowski L Zhong M Eeckhaut I Hartmann SHalanych KM Tiedemann R 2009 On the phylogenetic positionof Myzostomida can 77 genes get it wrong BMC Evol Biol 9150

Borucki H 1996 Evolution und phylogenetisches system der Chilopoda(Mandibulata Tracheata) Vehr naturwiss Ver Hamburg 3595ndash226

DellrsquoAmpio E Meusemann K Szucsich NU Peters RS Meyer B Borner JPetersen M Aberer AJ Stamatakis A Walzl MG et al 2014 Decisivedata sets in phylogenomics lessons from studies on the phyloge-netic relationships of primarily wingless insects Mol Biol Evol 31239ndash249

Dohle W 1985 Phylogenetic pathways in the Chilopoda BijdrDierkunde 5555ndash66

Dunn CW Hejnol A Matus DQ Pang K Browne WE Smith SA SeaverE Rouse GW Obst M Edgecombe GD et al 2008 Broad phyloge-nomic sampling improves resolution of the animal tree of lifeNature 452745ndash749

Edgar RC 2004 MUSCLE multiple sequence alignment with high accu-racy and high throughput Nucleic Acids Res 321792ndash1797

Edgecombe GD 2011a Chilopodamdashphylogeny In Minelli A editorTreatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 339ndash354

Edgecombe GD 2011b Chilopodamdashthe fossil history In Minelli Aeditor Treatise on zoologymdashanatomy taxonomy biology TheMyriapoda Vol 1 Leiden and Boston Brill p 355ndash361

Edgecombe GD Giribet G 2004 Adding mitochondrial sequence data(16S rRNA and cytochrome c oxidase subunit I) to the phylogeny ofcentipedes (Myriapoda Chilopoda) an analysis of morphology andfour molecular loci J Zool Syst Evol Res 4289ndash134

Edgecombe GD Giribet G 2007 Evolutionary biology of centipedes(Myriapoda Chilopoda) Annu Rev Entomol 52151ndash170

Edgecombe GD Giribet G 2008 A New Zealand species of thetrans-Tasman centipede order Craterostigmomorpha(ArthropodaChilopoda) corroborated by molecular evidenceInvert Syst 221ndash15

Edgecombe GD Giribet G Wheeler WC 1999 Phylogeny of Chilopodacombining 18S and 28S rRNA sequences and morphology BoletınSociedad Entomol Aragonesa 26293ndash331

Edwards SV 2009 Is a new and general theory of molecular systematicsemerging Evolution 631ndash19

Foster P 2004 Modeling compositional heterogeneity Syst Biol 53485ndash495

Fu LM Niu BF Zhu ZW Wu ST Li WZ 2012 CD-HIT accelerated forclustering the next-generation sequencing data Bioinformatics 283150ndash3152

Giribet G Carranza S Riutort M Baguna J Ribera C 1999 Internalphylogeny of the Chilopoda (Myriapoda Arthropoda) using

complete 18S rDNA and partial 28S rDNA sequences Philos TransR Soc Lond B Biol Sci 354215ndash222

Giribet G Edgecombe GD 2006 Conflict between data sets andphylogeny of centipedes an analysis based on seven genes andmorphology Proc R Soc B Biol Sci 273531ndash538

Grabherr MG Haas BJ Yassour M Levin JZ Thompson DA Amit IAdiconis X Fan L Raychowdhury R Zeng Q et al 2011 Full-length transcriptome assembly from RNA-Seq data without a ref-erence genome Nat Biotechnol 29644ndash652

Grbic M Van Leeuwen T Clark RM Rombauts S Rouze P Grbic VOsborne EJ Dermauw W Ngoc PC Ortego F et al 2011 Thegenome of Tetranychus urticae reveals herbivorous pest adaptationsNature 479487ndash492

Grunewald S Spillner A Bastkowski S Bogershausen A Moulton V2013 SuperQ computing supernetworks from quartets IEEEACMTrans Comput Biol Bioinform 10151ndash160

Haas BJ Papanicolaou A Yassour M Grabherr M Blood PD Bowden JCouger MB Eccles D Li B Lieber M et al 2013 De novo tran-script sequence reconstruction from RNA-seq using the Trinityplatform for reference generation and analysis Nat Protoc 81494ndash1512

Hartmann S Helm C Nickel B Meyer M Struck TH Tiedemann RSelbig J Bleidorn C 2012 Exploiting gene families for phylogenomicanalysis of myzostomid transcriptome data PLoS One 7e29843

Hejnol A Obst M Stamatakis A Ott M Rouse GW Edgecombe GDMartinez P Baguna J Bailly X Jondelius U et al 2009 Assessing theroot of bilaterian animals with scalable phylogenomic methods ProcR Soc B Biol Sci 2764261ndash4270

Hilken G 1997 Tracheal systems in Chilopoda a comparison underphylogenetic aspects Entomol Scand Suppl 5149ndash60

Huson DH Bryant D 2006 Application of phylogenetic networks inevolutionary studies Mol Biol Evol 23254ndash267

Inoue J Donoghue PCJ Yang ZH 2010 The impact of the representationof fossil calibrations on Bayesian estimation of species divergencetimes Syst Biol 5974ndash89

Jeffroy O Brinkmann H Delsuc F Philippe H 2006 Phylogenomics thebeginning of incongruence Trends Genet 22225ndash231

Kearse M Moir R Wilson A Stones-Havas S Cheung M Sturrock SBuxton S Cooper A Markowitz S Duran C et al 2012 GeneiousBasic an integrated and extendable desktop software platform forthe organization and analysis of sequence data Bioinformatics 281647ndash1649

Kennedy D 2005 125 [Editorial] Science 30919Koch M Edgecombe GD 2012 The preoral chamber in geophilomorph

centipedes comparative morphology phylogeny and the evolutionof centipede feeding structures Zool J Linn Soc 1651ndash62

Kocot KM Cannon JT Todt C Citarella MR Kohn AB Meyer A SantosSR Schander C Moroz LL Lieb B et al 2011 Phylogenomics revealsdeep molluscan relationships Nature 477452ndash456

Kubatko LS Degnan JH 2007 Inconsistency of phylogenetic estimatesfrom concatenated data under coalescence Syst Biol 5617ndash24

Langmead B Trapnell C Pop M Salzberg SL 2009 Ultrafast andmemory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10R25

Lartillot N Philippe H 2004 A Bayesian mixture model for across-siteheterogeneities in the amino-acid replacement process Mol BiolEvol 211095ndash1109

Lartillot N Philippe H 2008 Improvement of molecular phylogeneticinference and the phylogeny of Bilateria Philos Trans R Soc Lond BBiol Sci 3631463ndash1472

Lartillot N Rodrigue N Stubbs D Richer J 2013 PhyloBayes MPI phy-logenetic reconstruction with infinite mixtures of profiles in a par-allel environment Syst Biol 62611ndash615

Le SQ Dang CC Gascuel O 2012 Modeling protein evolution withseveral amino acid replacement matrices depending on site ratesMol Biol Evol 292921ndash2936

Lee MSY Soubrier J Edgecombe GD 2013 Rates of phenotypicand genomic evolution during the Cambrian Explosion Curr Biol231ndash7

1512

Fernandez et al doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 14: Evaluating Topological Conflict in Centipede Phylogeny ...€¦ · Scutigera coleoptrata Craterostigmus tasmanianus Strigamia maritima Himantarium gabrielis Alipes grandidieri Cryptops

Leigh JW Schliep K Lopez P Bapteste E 2011 Let them fall where theymay congruence analysis in massive phylogenetically messy datasets Mol Biol Evol 282773ndash2785

Lemmon AR Brown JM Stanger-Hall K Lemmon EM 2009The effect of ambiguous data on phylogenetic estimates obtainedby maximum likelihood and Bayesian inference Syst Biol 58130ndash145

Mallatt J Giribet G 2006 Further use of nearly complete 28S and 18SrRNA genes to classify Ecdysozoa 37 more arthropods and a kinor-hynch Mol Phylogenet Evol 40772ndash794

Meusemann K von Reumont BM Simon S Roeding F Strauss S Kuck PEbersberger I Walzl M Pass G Breuers S et al 2010 A phylogenomicapproach to resolve the arthropod tree of life Mol Biol Evol 272451ndash2464

Muller CHG Meyer-Rochow VB 2006 Fine structural organization ofthe lateral ocelli in two species of Scolopendra (ChilopodaPleurostigmophora) an evolutionary evaluation Zoomorphology12513ndash26

Murienne J Edgecombe GD Giribet G 2010 Including secondary struc-ture fossils and molecular dating in the centipede tree of life MolPhylogenet Evol 57301ndash313

Nosenko T Schreiber F Adamska M Adamski M Eitel M Hammel JMaldonado M Muller WE Nickel M Schierwater B et al 2013 Deepmetazoan phylogeny when different genes tell different stories MolPhylogenet Evol 67223ndash233

Philippe H Brinkmann H Lavrov DV Littlewood DTJ Manuel MWorheide G Baurain D 2011 Resolving difficult phylogeneticquestions why more sequences are not enough PLoS Biol 9e1000602

Philippe H Telford MJ 2006 Large-scale sequencing and the new animalphylogeny Trends Ecol Evol 21614ndash620

Pick KS Philippe H Schreiber F Erpenbeck D Jackson DJWrede P Wiens M Alie A Morgenstern B Manuel Met al 2010 Improved phylogenomic taxon sampling notice-ably affects nonbilaterian relationships Mol Biol Evol 271983ndash1987

Pocock RI 1902 A new and annectant type of chilopod Quart J MicroscSci 45417ndash448

Price MN Dehal PS Arkin AP 2010 FastTree 2-approximately maximum-likelihood trees for large alignmentsPLoS One 5e9490

Prunescu C-C Prunescu P 1996 Supernumerary Malpighian tubules inchilopods Acta Myriapodol 169437ndash440

Prunescu C-C Prunescu P 2006 Rudimentary supernumeraryMalpighian tubules in the order Craterostigmomorpha Pocock1902 Norwegian J Entomol 53113ndash118

Regier JC Shultz JW Zwick A Hussey A Ball B Wetzer R Martin JWCunningham CW 2010 Arthropod relationships revealed by phy-logenomic analysis of nuclear protein-coding sequences Nature 4631079ndash1083

Regier JC Wilson HM Shultz JW 2005 Phylogenetic analysis ofMyriapoda using three nuclear protein-coding genes MolPhylogenet Evol 34147ndash158

Rehm P Borner J Meusemann K von Reumont BM Simon S Hadrys HMisof B Burmester T 2011 Dating the arthropod tree based onlarge-scale transcriptome data Mol Phylogenet Evol 61880ndash887

Rota-Stabelli O Daley AC Pisani D 2013 Molecular timetrees reveal aCambrian colonization of land and a new scenario for ecdysozoanevolution Curr Biol 23392ndash398

Roure B Baurain D Philippe H 2013 Impact of missing data on phy-logenies inferred from empirical phylogenomic data sets Mol BiolEvol 30197ndash214

Salichos L Rokas A 2011 Evaluating ortholog prediction algorithms in ayeast model clade PLoS One 6e18755

Salichos L Rokas A 2013 Inferring ancient divergences requires geneswith strong phylogenetic signals Nature 497327ndash331

Salichos L Stamatakis A Rokas A 2014 Novel information theory-basedmeasures for quantifying incongruence among phylogenetic treesMol Biol Evol 31(5)1261ndash1271

Shear WA Bonamo PM 1988 Devonobiomorpha a new order of cen-tipeds (Chilopoda) from the middle Devonian of Gilboa New YorkState USA and the phylogeny of centiped orders Am MusNovitates 29271ndash30

Shear WA Edgecombe GD 2010 The geological record and phylogenyof Myriapoda Arthropod Struct Dev 39174ndash190

Shear WA Jeram AJ Selden PA 1998 Centiped legs (ArthropodaChilopoda Scutigeromorpha) from the Silurian and Devonian ofBritain and the Devonian of North America Am Mus Novitates32311ndash16

Shultz JW Regier JC 1997 Progress toward a molecular phylogeny of thecentipede orders (Chilopoda) Entomol Scand Suppl 5125ndash32

Smith S Wilson NG Goetz F Feehery C Andrade SCS Rouse GWGiribet G Dunn CW 2011 Resolving the evolutionary relationshipsof molluscs with phylogenomic tools Nature 480364ndash367

Smith SA Dunn CW 2008 Phyutility a phyloinformatics tool for treesalignments and molecular data Bioinformatics 24715ndash716

Struck TH Paul C Hill N Hartmann S Hosel C Kube M Lieb B Meyer ATiedemann R Purschke G et al 2011 Phylogenomic analyses un-ravel annelid evolution Nature 47195ndash98

Sukumaran J Holder MT 2010 DendroPy a Python library for phylo-genetic computing Bioinformatics 261569ndash1571

Verhoeff KW 1902ndash1925 Gliederfussler Arthropoda Klasse ChilopodaIn Klassen und Ordnungen des Tierreichs 5 abt 2 Buch 1 Leipzig(Germany) Akademische Verlagsgesellschaft p 725

von Reumont BM Jenner RA Wills MA Dellrsquoampio E Pass GEbersberger I Meyer B Koenemann S Iliffe TM Stamatakis Aet al 2012 Pancrustacean phylogeny in the light of new phyloge-nomic data support for Remipedia as the possible sister group ofHexapoda Mol Biol Evol 291031ndash1045

Wilson HM Anderson LI 2004 Morphology and taxonomy of Paleozoicmillipedes (Diplopoda Chilognatha Archipolypoda) from ScotlandJ Paleontol 78169ndash184

Wirkner CS Pass G 2002 The circulatory system in Chilopodafunctional morphology and phylogenetic aspects Acta Zool 83193ndash202

Wu M Chatterji S Eisen JA 2012 Accounting for alignment uncertaintyin phylogenomics PLoS One 7e30288

Yang Z Rannala B 2006 Bayesian estimation of species divergence timesunder a molecular clock using multiple fossil calibrations with softbounds Mol Biol Evol 23212ndash226

Zoller S Schneider A 2013 Improving phylogenetic inference with asemiempirical amino acid substitution model Mol Biol Evol 30469ndash479

1513

Evaluation of Topological Conflict in Centipede Phylogeny doi101093molbevmsu108 MBE at H

arvard Library on June 12 2014

httpmbeoxfordjournalsorg

Dow

nloaded from