evaluating the use of pairwise dissimilarity metrics in paleoanthropology

13
Evaluating the use of pairwise dissimilarity metrics in paleoanthropology Adam D. Gordon a, * , Bernard Wood b a Department of Anthropology, CAS 237, University at AlbanyeSUNY,1400 Washington Ave., Albany, NY 12222, USA b Center for the Advanced Study of Hominid Paleobiology, George Washington University, 2114 G Street NW, Washington, DC 20052, USA article info Article history: Received 16 July 2013 Accepted 6 August 2013 Available online xxx Keywords: Log se m STET s LR Alpha taxonomy Hominins Conspecicity Shape abstract Questions of alpha taxonomy are best addressed by comparing unknown specimens to samples of the taxa to which they might belong. However, analysis of the hominin fossil record is riddled with methods that claim to evaluate whether pairs of individual fossils belong to the same species. Two such methods, log se m and the related STET method, have been introduced and used in studies of fossil hominins. Both methods attempt to quantify morphological dissimilarity for a pair of fossils and then evaluate a null hypothesis of conspecicity using the assumption that pairs of fossils that fall beneath a predened dissimilarity threshold are likely to belong to the same species, whereas pairs of fossils above that threshold are likely to belong to different species. In this contribution, we address (1) whether these particular methods do what they claim to do, and (2) whether such approaches can ever reliably address the question of conspecicity. We show that log se m and STET do not reliably measure deviations from shape similarity, and that values of these measures for any pair of fossils are highly dependent upon the number of variables compared. To address these issues we develop a measure of shape dissimilarity, the Standard Deviation of Logged Ratios (s LR ). We suggest that while pairwise dissimilarity metrics that accurately measure deviations from isometry (e.g., s LR ) may be useful for addressing some questions that relate to morphological variation, no pairwise method can reliably answer the question of whether two fossils are conspecic. Ó 2013 Elsevier Ltd. All rights reserved. Introduction It has long been recognized that individuals within the same species vary in their expression of phenotypic traits (Darwin, 1859). As a consequence, when investigating the alpha taxonomy of new fossil discoveries, it is always desirable to compare the new, un- known, material to samples of each of the taxa to which it might belong. However, the fact remains that when it comes to relatively complete crania, many proposed fossil hominin species are effec- tively represented by single specimens (e.g., ARA-VP-6/500, Ardi- pithecus ramidus; KNM-WT 40000, Kenyanthropus platyops; KNM- WT 17000, Paranthropus aethiopicus; TM 266-01-060-1, Sahelan- thropus tchadensis)(Smith, 2005). Furthermore, when we look back over the history of discovery of what are now broadly accepted and well-characterized early hominin taxa with respectable samples (e.g., Australopithecus africanus, Paranthropus boisei), for a long time in the course of the accumulation of their hypodigms they were effectively represented by a single relatively well-preserved spec- imen (e.g., respectively, Sts 5 and OH 5). As a result, although it is far from an ideal situation, there have been several attempts over the years to quantify shape differences between pairs of crania in an attempt to say something about the taxonomic afnities of indi- vidual fossil specimens. Here we evaluate two current methods against two criteria: (1) do they accurately and adequately measure shape dissimilarity, and (2) if they do, are they capable of testing hypotheses about alpha taxonomy? Over the past two decades, two such techniques, both based on regressions of paired interlandmark distances, have been devel- oped to address the question of whether pairs of fossils belong to the same species: the Logarithm of the Standard Error of the Slope method (log se m )(Thackeray et al., 1997) and the Standard Error Test (STET) (Wolpoff and Lee, 2001). Thackeray and colleaguestechnique quanties shape dissimilarity between sets of homolo- gous interlandmark distances for two specimens as the base 10 log- transformed standard error of the slope of a line passed through the paired raw data points (Fig. 1 illustrates this for a pair of modern * Corresponding author. E-mail addresses: [email protected] (A.D. Gordon), [email protected] (B. Wood). Contents lists available at ScienceDirect Journal of Human Evolution journal homepage: www.elsevier.com/locate/jhevol 0047-2484/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jhevol.2013.08.002 Journal of Human Evolution xxx (2013) 1e13 Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating the use of pairwise dissimilarity metrics in paleoanthropology, Journal of Human Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

Upload: bernard

Post on 18-Dec-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

lable at ScienceDirect

Journal of Human Evolution xxx (2013) 1e13

Contents lists avai

Journal of Human Evolution

journal homepage: www.elsevier .com/locate/ jhevol

Evaluating the use of pairwise dissimilarity metricsin paleoanthropology

Adam D. Gordon a,*, Bernard Wood b

aDepartment of Anthropology, CAS 237, University at AlbanyeSUNY, 1400 Washington Ave., Albany, NY 12222, USAbCenter for the Advanced Study of Hominid Paleobiology, George Washington University, 2114 G Street NW, Washington, DC 20052, USA

a r t i c l e i n f o

Article history:Received 16 July 2013Accepted 6 August 2013Available online xxx

Keywords:Log semSTETsLRAlpha taxonomyHomininsConspecificityShape

* Corresponding author.E-mail addresses: [email protected] (A.D. Gordo

(B. Wood).

0047-2484/$ e see front matter � 2013 Elsevier Ltd.http://dx.doi.org/10.1016/j.jhevol.2013.08.002

Please cite this article in press as: Gordon, AHuman Evolution (2013), http://dx.doi.org/1

a b s t r a c t

Questions of alpha taxonomy are best addressed by comparing unknown specimens to samples of thetaxa to which they might belong. However, analysis of the hominin fossil record is riddled with methodsthat claim to evaluate whether pairs of individual fossils belong to the same species. Two such methods,log sem and the related STET method, have been introduced and used in studies of fossil hominins. Bothmethods attempt to quantify morphological dissimilarity for a pair of fossils and then evaluate a nullhypothesis of conspecificity using the assumption that pairs of fossils that fall beneath a predefineddissimilarity threshold are likely to belong to the same species, whereas pairs of fossils above thatthreshold are likely to belong to different species. In this contribution, we address (1) whether theseparticular methods do what they claim to do, and (2) whether such approaches can ever reliably addressthe question of conspecificity. We show that log sem and STET do not reliably measure deviations fromshape similarity, and that values of these measures for any pair of fossils are highly dependent upon thenumber of variables compared. To address these issues we develop a measure of shape dissimilarity, theStandard Deviation of Logged Ratios (sLR). We suggest that while pairwise dissimilarity metrics thataccurately measure deviations from isometry (e.g., sLR) may be useful for addressing some questions thatrelate to morphological variation, no pairwise method can reliably answer the question of whether twofossils are conspecific.

� 2013 Elsevier Ltd. All rights reserved.

Introduction

It has long been recognized that individuals within the samespecies vary in their expression of phenotypic traits (Darwin, 1859).As a consequence, when investigating the alpha taxonomy of newfossil discoveries, it is always desirable to compare the new, un-known, material to samples of each of the taxa to which it mightbelong. However, the fact remains that when it comes to relativelycomplete crania, many proposed fossil hominin species are effec-tively represented by single specimens (e.g., ARA-VP-6/500, Ardi-pithecus ramidus; KNM-WT 40000, Kenyanthropus platyops; KNM-WT 17000, Paranthropus aethiopicus; TM 266-01-060-1, Sahelan-thropus tchadensis) (Smith, 2005). Furthermore, whenwe look backover the history of discovery of what are now broadly accepted andwell-characterized early hominin taxa with respectable samples(e.g., Australopithecus africanus, Paranthropus boisei), for a long time

n), [email protected]

All rights reserved.

.D., Wood, B., Evaluating the0.1016/j.jhevol.2013.08.002

in the course of the accumulation of their hypodigms they wereeffectively represented by a single relatively well-preserved spec-imen (e.g., respectively, Sts 5 and OH 5). As a result, although it is farfrom an ideal situation, there have been several attempts over theyears to quantify shape differences between pairs of crania in anattempt to say something about the taxonomic affinities of indi-vidual fossil specimens. Here we evaluate two current methodsagainst two criteria: (1) do they accurately and adequately measureshape dissimilarity, and (2) if they do, are they capable of testinghypotheses about alpha taxonomy?

Over the past two decades, two such techniques, both based onregressions of paired interlandmark distances, have been devel-oped to address the question of whether pairs of fossils belong tothe same species: the Logarithm of the Standard Error of the Slopemethod (log sem) (Thackeray et al., 1997) and the Standard ErrorTest (STET) (Wolpoff and Lee, 2001). Thackeray and colleagues’technique quantifies shape dissimilarity between sets of homolo-gous interlandmark distances for two specimens as the base 10 log-transformed standard error of the slope of a line passed through thepaired raw data points (Fig. 1 illustrates this for a pair of modern

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 2: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Figure 1. Illustration of log sem. Consider a pair of crania where one cranium is largerin most measurements than the other. In this case, 56 interlandmark distances weremeasured on two modern human crania. Paired homologous measurements areplotted against each other (open circles), e.g., orbital height from cranium 1 is plottedagainst orbital height from cranium 2, cranial length from cranium 1 is plotted againstcranial length from cranium 2, etc. An ordinary least squares regression is thencalculated for these data points, as is the logarithm of the standard error of the slope(i.e., log sem). The standard error of the slope is dependent on three factors: the slope ofthe regression line (m), the number of data points (i.e., the number of measurements,k, which is 56 in this case), and the coefficient of determination (r2), as follows:sem ¼ ðm=

ffiffiffiffiffiffiffiffiffiffiffiffik� 2

pÞð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið1=r2Þ � 1

pÞ. Thus the value of log sem decreases when the slope

decreases, the number of measurements increases, or r2 increases. If all pairs ofmeasurements were perfectly correlated (gray points), they would sit directly on theregression line and have an infinitely negative log sem value (because sem ¼ 0). Theactual measurements have a lower correlation (open circles) and thus they sit fartherfrom the line and have a higher log sem. As proposed by Thackeray et al. (1997), ahigher value of log sem is a measure of greater shape dissimilarity between two crania.Note that swapping the axes upon which the measurements are plotted does not affectk or r2, but it does affect the slope (in this case, changing it from a value greater thanone to a value less than one), which in turn affects the value of log sem (see text fordiscussion of STET, which was developed to take this asymmetry into account).

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e132

human crania). Using a data set of 1260 specimens from 70 verte-brate and invertebrate species, they arrived at a mean value oflog sem for all pairs of conspecific crania across all taxa (Thackerayet al., 1997). Thackeray later refined the sample to include onlyvertebrates, and based on this taxonomically diverse data set,suggested that “when comparisons are made between any twospecimens of the same species, log sem approximates a ‘biologicalspecies constant’ (T ¼ �1.61) across evolutionary time andgeographical space”, which “facilitates the assessment of proba-bilities of conspecificity between any two fossils” (Thackeray, 2007:489). Notably, neither of these papers (or subsequent worksapplying this technique) tested whether log sem is a constant valuewithin the same pair of specimens given different numbers ofpaired homologous measurements, let alone whether the value isconstant across time and space. As will be discussed further below,STET is similar to log sem, although it differs in the sense that ittakes into account the asymmetrical nature of ordinary leastsquares regression. Given that log sem and STET have seen a fairamount of use in addressing questions of hominin alpha taxonomyin recent years (see below), we believe it is important to evaluatewhether or not they are appropriate for that task.

These two methods have mainly been used to address thetaxonomic affiliations of early African hominin crania and the

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

question of conspecificity of Neanderthals and anatomically mod-ern humans (e.g., Thackeray et al., 1997, 2005; Wolpoff and Lee,2001, 2006; Lee and Wolpoff, 2005, 2007; Thackeray, 2007, 2010;Thackeray and Prat, 2009; Cofran and Thackeray, 2010; Houghtonand Thackeray, 2011; Thackeray and Odes, 2013). These analysesare typically presented by their proponents as tests of the nullhypothesis of conspecificity for two crania, although they are moreaccurately represented as assessments of whether ametric of shapedissimilarity between two crania exceeds some empirically-defined threshold. In the studies cited above, the authors findthat these methods generally fail to reject the ‘null hypothesis ofconspecificity’ (i.e., shape dissimilarity does not exceed somethreshold value) (e.g., Wolpoff and Lee, 2001, 2006; Lee andWolpoff, 2005, 2007), even when the crania being compared aregenerally recognized to belong to separate species (e.g., compari-sons between three crania attributed to Australopithecus sediba, Au.africanus, and Homo habilis, Thackeray, 2010). It is important tokeep in mind (as we discuss further below) that it cannot be simplyassumed that tests based on morphological dissimilarity areadequate for testing alpha taxonomic hypotheses.

Our purpose in evaluating these methods is not to judge them interms of the ongoing debate among morphologists who prefer tomeasure shape using methods that consider the spatial relation-ships among all landmarks on sets of objects (such as a crania)when translated into a common coordinate system (i.e., geometricmorphometrics), with the attendant issues of how to deal with theeffects of translation, rotation, and size when aligning homologouslandmarks; versus those who prefer to measure shape using sets ofinterlandmark distances (e.g., as used in Euclidean Distance MatrixAnalysis [EDMA] approaches), which avoids the issues related tospatial position and orientation, but arguably also discards poten-tially important information regarding the relative position oflandmarks and semi-landmarks (for discussions of these issues seeLele and Richtsmeier, 2001; and the articles in Slice, 2005). Instead,we seek to merely demonstrate whether or not log sem and STET dowhat their proponents claim they do in the context of analysesbased on interlandmark distances.

Both log sem and STET purport to measure shape dissimilarityfor a pair of fossils, usually crania, based on comparing measure-ments from each fossil for a set of k variables (where k is thenumber of variables being compared, the variables themselvestypically being linear distances; e.g., cranial length, maximumcranial breadth, etc.). These dissimilarity metrics are both based onthe standard error of the slope for an ordinary least squaresregression of k linear measurements for one fossil against thematching homologous linear measurements of a second fossil, asfollows:

log sem ¼ log10�sem½xy�

STET ¼ 100*h�

sem½xy��2 þ �

sem½yx��2i0:5

where sem[xy] is the standard error (se) of the slope (m) for themeasurements of fossil y regressed against the measurements offossil x, and sem[yx] is the standard error of the slope for fossil xregressed against fossil y (see Fig. 1 caption for the equation for thestandard error of the slope).

Two related assumptions underlie the log sem method, either orboth of which may be incorrect. First, when two fossils arecompared, if all of the data points representing the paired mea-surements fall exactly on a regression line that passes throughthem, then the two fossils are identical in shape. Second, andconversely, a relatively high standard error of the slope for such a

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 3: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Table 1Cranial specimens used to evaluate how well morphological dissimilarity tracks taxonomic identity.

Analysis Species nF nM Source Collection

Intergenericcomparisons

Colobus guereza 12 12 Wood (1976) Osteology Collection, National Museum, Nairobi, KenyaGorilla gorilla 17 20 Wood (1976) Powell-Cotton Collection, Birchington, Kent, EnglandHomo sapiens 35 40 Wood (1976) R.A. Dart Collection, Department of Anatomy, University

of Witwatersrand, South AfricaPan troglodytes 22 13 Wood (1976) Powell-Cotton Collection, Birchington, Kent, EnglandPongo pygmaeus 21 20 Wood et al. (1991) Rijksmuseum von Natuurlijke Historie (now Naturalis), Leiden

Intragenericcomparisons

Pan paniscus 19 15 Brown and Maeda (2009) Royal Museum of Central Africa, Tervuren, BelgiumPan troglodytes 19 15 Brown and Maeda (2009) Powell-Cotton Collection, Birchington, Kent, England

Abbreviations: nF, number of females; nM, number of males.

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e13 3

regression would indicate substantial scatter about the line andthus large deviations from shape similarity between two speci-mens. Here we use ‘similarity’ in its geometric sense (i.e., objectsthat are identical in shape but may differ in size). Assuming thatthese assumptions are correct, the argument goes that evidence ofsuch morphological dissimilarity would indicate a low probabilitythat the two fossils could be sampled from the same species(Thackeray, 1997; Thackeray et al., 1997; Aiello et al., 2000; Lee,2011) . A strict reading of Thackeray’s (2007) ‘biological constant’concept would suggest that there is a threshold value for all ver-tebrates beyond which two specimens could not belong to thesame species. As we suggest above, the validity of these argumentshas yet to be demonstrated. The STET is a slightly different metricbased on the same underlying assumptions as log sem and it is usedby its proponents in the same sort of significance tests. Because thevalue of the slope and its standard error are dependent on whichcranium is placed on the Y-axis, and because the decision of whichfossil to place upon the Y-axis is an arbitrary one, STET wasdeveloped to take into account the standard error of the slope forboth regressions (i.e., y on x and x on y) (Lee, 2011).

It should be noted that log sem and STET are unusual applica-tions of regression in that the true sample size for any comparisonusing thesemethods is n¼ 2, the number of fossils being compared,not k, the number of variables plotted against each other; the set ofk variables is an arbitrary selection of the potentially infinitenumber of variables that could be measured on any pair of fossils.For this reason, it is not appropriate to consider significance tests ascalculated directly for the regression of the k variables, and indeed,proponents of these techniques do not do so. Rather, proponentscompare values of log sem or STET from a pair of fossils to a range ofvalues derived from reference samples of known taxonomic affili-ation to determine whether the particular pair of fossils underconsideration show similar levels of shape difference or not. Ac-cording to practitioners of these methods, values that fall within orbelow the range of 95% of values found in pairs of crania known tobe conspecific are taken to indicate a high probability of con-specificity, while values that exceed the maximum value found in95% of pairs of conspecifics are interpreted as meaning that the pairof fossils in question are unlikely to belong to the same species(Aiello et al., 2000; Lee, 2011).

The focus of the present study differs from earlier work thataddressed the empirical performance of log sem (Aiello et al., 2000)and STET (Lee, 2011) when attempting to identify conspecificity inpairs of fossils. Below we demonstrate from a theoreticalperspective that log sem and STET do not reliablymeasure deviationfrom shape similarity between pairs of fossils, and we showempirically that properties such as high variability and dependenceupon number of included variables (k) significantly impair theutility of these techniques as descriptive statistics. We go on topresent a modified version of these dissimilarity metrics that isrobust to these particular criticisms and compare it to log sem andSTET. Finally, we address the questions of whether any pairwise

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

tests of shape similarity reliably inform questions about alphataxonomy, and what the appropriate application of a pairwisedissimilarity metric is in the study of fossil hominins.

Materials and methods

Two sets of analyses were conducted for the present study. Thefirst compared three metrics that purport to measure morpholog-ical dissimilarity. The second evaluated the performance ofdissimilarity metrics when attempting to determine whether pairsof crania are conspecific or not. Different data sets were used toaddress these questions. Each analysis is described in detail below.All analyses were performed in the statistical programming lan-guage R, version 2.15.0 (R Development Core Team, 2012).

Cranial measurements used in the first set of analyses weredrawn from Howells’ craniometric data set (Howells, 1973, 1989,1996), available for download at http://konig.la.utk.edu/howells.htm and http://web.utk.edu/wauerbach/HOWL.htm. The sampleused here represents 2524 modern human crania (1156 femalesand 1368 males) drawn from 30 populations. Analysis is limited to56 linear interlandmark measurements of the cranium (angles andindices available in the full data set are excluded).

Two data sets were used in the second set of analyses. The firstset of cranial measurements represents five primate species whosesample composition was selected to minimize inter-populationmorphological variation within each species (Table 1; data pro-vided in Supplementary Online Material [SOM], Table 1). All sam-ples are geographically restricted: the Colobus guereza sampleincludes only monkeys from the vicinity of Thompson’s Falls,Kenya; the Gorilla gorilla and Pan troglodytes samples each repre-sent a single subspecies (G. g. gorilla and P. t. troglodytes) fromforests in and bordering onmodern Cameroon; the Pongo pygmaeussample is from Borneo; and the Homo sapiens sample is from dis-secting room cadavers of Nguni and Sotho peoples from the Uni-versity of the Witwatersrand, South Africa (Wood, 1975, 1976;Wood et al., 1991). All of the species samples are of crania housedin single collections (Table 1). All of the non-human primatespecimens are wild-shot animals. Twenty-four linear interland-mark cranial variables were used in this analysis. Other analysesbased on these measurements have been published previously(Wood, 1976; Wood et al., 1991). The second data set was drawnfrom a published set of 23 cranial measurements of P. paniscus andP. troglodytes (Brown and Maeda, 2009), and was constructed sothat the sample sizes were equal for both species (Table 1; dataprovided in SOM Table 2). The samples are restricted in the samemanner as above, with the P. troglodytes sample again comprised ofa single subspecies (P. t. troglodytes) from forests in modernCameroon, all specimens were wild-shot and each species samplewas housed in a single collection. Together, these two data setswere used to evaluate the performance of dissimilarity metricswhen used to compare crania belonging to different genera as

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 4: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Figure 3. Bivariate plot of 56 linear measurements in mm from one cranium plottedagainst the corresponding measurements in another cranium for 5000 randomlyselected pairs of modern human crania. Data points are semi-transparent, so darkerareas reflect higher densities of data points. Note the presence of heteroscedasticity:there is greater variability in measurements in the Y-axis at higher values along the X-axis (e.g., compare height of vertical white dotted lines at x ¼ 5 mm and x ¼ 125 mm).

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e134

opposed to those belonging to different species within the samegenus.

Results and discussion

Measuring shape dissimilarity

In order for two objects to be the same shape, they may differ insize, but in all respects each object must be an exactly scaled-up orscaled-down version of the other. When considering sets of inter-landmark distances (such as log sem and STET do), shape can onlybe the same if all of the measurements in the same dimension (e.g.,all linear) for a set of variables in one object are equal to the cor-responding measurements in the other object, multiplied by ascaling constant such that:

y ¼ mx

where m is a constant value (i.e., all measurements scale isomet-rically with each other; note that y and x are not logged in thisexample) (Jungers, 1985; Jungers et al., 1995). In this case, y rep-resents the measurements for a set of variables from one cranium(e.g., cranial length, cranial width, cranial height, etc.) and x rep-resents the corresponding measurements in another cranium. Notethat this shape similarity only applies to those parts of the craniathat have been measured and included in the analysis.

With the addition of an error term ( 3) to the right hand side ofthe equation, this becomes the equation for a least squaresregression constrained to pass through the origin:

y ¼ mxþ 3

A quantification of deviations from this regression model couldarguably be interpreted as a metric of shape dissimilarity. However,log sem is not based on this regressionmodel, but rather on one thatalso includes an intercept term, as follows:

y ¼ mxþ cþ 3

where c is the Y-intercept (Thackeray et al., 1997).Measuring deviations from such a regression does not measure

deviation from shape similarity, as could be argued to be the case

Figure 2. Comparison of homologous measurements for two idealized primate crania (in proaustralopith-like cranium on the right. Despite the different shapes of these crania, when thnot logged data), all four data points fall exactly on a regression line that does not pass throconstrained to pass through the origin (dashed diagonal line). According to log sem and ST

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

when using a regression constrained to pass through the origin.Instead, when an intercept term is included, points that fall exactlyon a regression line may be drawn from two very different shapes.Fig. 2 illustrates an extreme example. Another way to think of theimportance of the distinction between regression models thatinclude an intercept and those that do not is to consider whathappens when data points fall on the intercept itself. For two craniathat have data points that all fall exactly on a regression line with anon-zero intercept, and thus would be considered identical inshape according to log sem and STET, it would be possible for ameasurement to have a zero value for one specimen and a non-zerovalue for the other (e.g., anterior projection of the supraorbitaltorus in various hominin crania). This is what occurs at the Y-intercept when it is not zero. Clearly two crania that produced sucha patternwould not be identically shaped. Because log sem and STETare both based on the standard error of the slope for regression

file). Note the difference in shape between the monkey-like cranium on the left and thee four highlighted measurements are plotted against each other (as non-transformed,ugh the origin (solid diagonal line) although they do not fall on a regression line that isET, the shape of these two crania would be considered to be identical.

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 5: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Table 2Properties of the three dissimilarity metrics.

Dissimilaritymetric

Lower limit Value indicatingidentical shape

Upperlimit

Value staysthe same if Xand Y craniaare swapped?

log sem �infinity �infinitya infinity NoSTET 0 0a infinity YessLR 0 0 infinity Yes

a The values indicating identical shape for log sem and STET only apply if theregression intercept is zero. Otherwise, this value simply indicates that all datapoints plot exactly on a regression line, not that the two crania are identical in shape.

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e13 5

models that include an intercept, they are measures of deviationfrom regressions that may or may not describe shape similarity,depending on whether or not the intercept is exactly zero.

One possibleway thesemethodsmight be adjusted to correct thisparticular problem is by calculating the standard error of the slopefrom a regression constrained to zero. However, a further problem isintroduced by the presence of heteroscedasticity. Fig. 3 plots the 56linear measurements of Howells’ (1996) data set for 5000 randomlyselectedpairs ofmodernhumancrania. As shown in thisfigure, largermeasurements tend to bemore variable than smallermeasurements.Such heteroscedasticity (unequal variation across the range of data ina regression) violates a primary assumption of least squares regres-sion because values at the high end of the X-axis tend to have greaterleverage than values at the low end, thereby introducing bias intoestimates of the slope and thus the standard error of the slope. Inother words, larger measurements (e.g., maximum cranium length)have a greater impact on both the slope and the standard error of theslope than do smaller measurements (e.g., interorbital breadth)simply because they have higher absolute values and thus higherabsolute levels of variation, even though their relative levels of vari-ation may be the same or less than those of smaller measurements.This is an inherent problem for both log sem and STET.

A related measure of shape dissimilarity that avoids this prob-lem can be developed through consideration of the question ofscaling in logarithmic space, where by definition, two sets of log-transformed homologous measurements from objects that areidentical in shape (but may differ in size) will scale isometricallywith each other and thus fall on a regression line with a slope equalto one (Jungers, 1985). Log-transforming variables addresses theleverage problem associated with large measurements becausedifferences in logged measurements represent proportional differ-ences in raw data, not absolute differences, and thus all variablesthat have the same level of relative variation (such as measured bycoefficients of variation) will have equal variability when logged,regardless of the difference of scale for the raw variables. As shownin Appendices A and B, the standard deviation of the logged ratios ofpaired homologous measurements is mathematically equivalent tothe standard deviation of residuals from a regression of log-trans-formed values with a slope equal to one; this metric (1) measuresrelative variation as opposed to absolute variation and thus does nothave the leverage problems associated with log sem and STET, and(2) accurately measures deviations from isometry (i.e., from shapesimilarity), with larger values indicating greater difference betweenshapes. We refer to this measure of dissimilarity between two ob-jects as the Standard Deviation of Logged Ratios, which we abbre-viate as sLR (calculation given in Appendix, Equation B.25). While sLRcan be calculated using any base for the logarithm, we use base 10logarithms for ease of comparison with log sem.

Comparing shape dissimilarity metrics

A comparison of some of the basic properties of the threedissimilarity metrics is provided in Table 2. Leaving aside for the

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

moment the question of whether or not log sem and STET actuallymeasure deviations from shape similarity, it is worthwhile toconsider the stability of the threemetrics across different values of k,the number of variables used in the calculation of dissimilarity, giventhat a log sem value of �1.61 has been argued to be a biologicalconstant for the amount of variation between two conspecific craniaregardless of how many measurements are considered (Thackeray,2007). Within a single comparison between two crania, any num-ber of variables greater than two can be compared, with no theo-retical upper limit.However, the abilityof shapedissimilaritymetricsto capture shape information related to taxonomic differences hasbeen observed to improve as the number of variables increases, asonemight expect (Lee, 2011). Herewe compare log sem, STET, and sLRacross a wide range of k (the number of variables) to address varia-tion in dissimilarity values for the same pair of crania as k increases.

This analysis used a single pair of crania in the calculation of allthree dissimilarity metrics across the values of k: one male and onefemalemodern human cranium from the samepopulation (Norse) inthe Howells (1996) data set (Howells cranial specimens numbers 32and57, thecraniawhosemeasurements are shown inFig.1).A total of56 linear variables were available for both crania. An iterative pro-cedurewas implemented inwhich subsets of variables of sample sizekwere selected from the full set of 56 variables.We increased k from10 to 54 variables in increments of two. At each value of k, 5000subsets of variables were randomly selected. For example, whenk ¼ 10, one subset might include glabella-occipital length, naso-occipital length, and eight other variables; a second subset mightinclude nine of those same variables but replace naso-occipitallength with palate breadth. The three dissimilarity metrics underconsideration (STET, log sem, and sLR) were calculated for each subsetof variables. In all cases, female cranialmeasurementswere on theX-axis and male measurements were on the Y-axis. Thus variation inlog semwas due solely to the variables selected for inclusion and notdue to the male sometimes being placed on the X-axis and some-times on the Y-axis; as a rule, values for STETand sLR do not varywithrespect to which cranium is placed on the Y-axis (Table 2).

The results of applying this procedure to measurements drawnfrom the same pair of modern human crania are shown in Fig. 4.The observed value of log sem, STET, or sLR for the full set of 56variables is shown as a gray line in each plot. Meanswere calculatedfor each set of 5000 shape dissimilarity values for every value of k,and those means are indicated by the white lines in Fig. 4. Meandissimilarity values decrease as the number of variables increasesfor both log sem and STET (Fig. 4a and b), despite the fact that alldissimilarity values are calculated for the same pair of crania. This isan undesirable property of these metrics, although predictablebecause the standard error of the slope upon which they are basedis directly proportional to 1=

ffiffiffiffiffiffiffiffiffiffiffiffik� 2

p. In contrast, mean sLR is

essentially stable across all values of k (Fig. 4c).In order to compare relative variability, log sem, STET, and sLR are

plotted in the same scale for direct comparison (lower row of Fig. 4).Both log sem and sLR are measured in log10 ratio units (log sem beingthe base 10 logarithm of a standard deviation for the slope of aregression line, which is a ratio, and sLR being the standard devia-tion of the base 10 logarithm of a ratio); STET is measured in ratiounits, so base 10 logarithms of STET values were plotted in Fig. 4e.When viewed at the same scale (Fig. 4def), it is apparent thatlog sem and STET values exhibit much higher levels of variability atall values of k than does sLR. This variability is due at least in part tothe nature of the regression model used: when relatively few var-iables are included (i.e., when k is low) there is greater variability inthe placement of the Y-intercept, and thus greater variability inhow accurately the regression slope (against which measurementsare being compared as a measure of dissimilarity) describes shapeequivalence. In contrast, sLR always compares measurements

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 6: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Figure 4. Comparison of log sem (a and d), STET (b and e), and sLR (c and f) values for the same pair of modern human crania when subsets of variables (k ¼ 10 to k ¼ 54 inincrements of two variables) are chosen from the full set of 56 linear variables. For a given number of variables (k), 5000 subsets containing k variables were selected from the fulldata set. Gray lines track the value of log sem, STET, or sLR for the full set of 56 variables; white lines follow the mean of 5000 randomly sampled sets of variables at intervalsincreasing from 10 to 54 variables. Top plots (aec) scale the Y-axis to the range for each metric to identify trends; bottom plots (def) plot all metrics in the same scale to showrelative variability. Note that STET values are logged in plot e so that they may be plotted in the same units as log sem and sLR. In contrast with log sem and STET, sLR values are stableacross all values of k (no appreciable trend in mean with increasing sample size) and they exhibit a relatively low level of variability at any value of k.

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e136

against shape equivalence as measured by isometric scaling, andthus its variability is low compared with the other two metrics.However, there is also a further slight decrease in variability of sLRas k increases due to an increasing amount of information aboutoverall shape as more variables are included and an increasingoverall similarity between data sets in terms of the specific kmeasurements being compared (Fig. 4c and f). This phenomenon isprobably also partially responsible for the decrease in variability oflog sem and STET at higher values of k.

As illustrated above, sLR exhibits both relatively low variabilityand relative stability in central tendencywith respect to the numberof variables included in its calculation. In contrast, log sem and STETare both highly variable and have central tendencies that arestrongly influenced by the number and the nature of the variablesused in their calculation. These properties, combined with theobservation that log sem and STET do not actuallymeasure deviationfrom shape similarity in most cases, whereas sLR always does, meanthat sLR should be favored over log sem and STET as a measure ofshape dissimilarity when comparing sets of homologous inter-landmark distances between any pair of objects (e.g., crania). How-ever, this should not be construed as an endorsement of the use ofany of these three metrics for addressing questions of alpha taxon-omy in general, nor conspecificity for pairs of fossils in particular.

Does shape similarity imply conspecificity?

To empirically address the question of how well dissimilaritymetrics identify whether or not two crania are conspecific,

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

dissimilarity values (sLR, log sem, and STET) were first calculated forall pairs of crania within and between species in the five speciesdata set. For the 212 crania in the comparative data set there are22,366 unique pairs of crania, of which 5132 are conspecific pairsand 17,234 are pairs of crania belonging to different species (all ofwhich belong to different genera). As described above, sampleswere selected such that inter-population variation was minimizedwithin each species sample. This should result in relatively lowmeasures of dissimilarity within each species sample. As a conse-quence, dissimilarity values for pairs of crania belonging todifferent species are more likely to be outside the range ofconspecific values than if species samples had included morevariation (as might result from the inclusion of multiple subspecies,e.g., Pilbrow, 2006, 2010). Stated in the framework of hypothesistesting used by the proponents of these methods (ignoring for themoment whether such a framework is justified on theoreticalgrounds), the sample has been selected to minimize variationwithin species and thus minimize the occurrence of Type II error;i.e., failure to reject the null hypothesis that two crania belong tothe same species when in fact they belong to two different species.

As expected, mean dissimilarity values are lower for compari-sons within species than between genera (Table 3). Also, as onemight predict, the highest intraspecific dissimilarity values tend tooccur in comparisons of male and female crania in species withhigh degrees of sexual dimorphism in shape (e.g., Gorilla gorilla andPongo pygmaeus; Table 3, Fig. 5).

An exact randomization approach was used to develop a sig-nificance test that proponents of these methods would describe as

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 7: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Table 3Summary of dissimilarity values calculated for all unique pairs in the intergeneric comparative crania data set.

Taxon Sexes sLR STET log sem

Mean Range S.D. Mean Range S.D. Mean Range S.D.

Colobus guereza FeF 0.030 (0.019e0.049) 0.006 2.62 (1.77e4.07) 0.501 �1.74 (�1.92 to �1.54) 0.084MeM 0.033 (0.019e0.050) 0.007 3.20 (1.50e4.66) 0.655 �1.65 (�1.98 to �1.48) 0.097FeM 0.036 (0.017e0.058) 0.007 3.53 (1.97e5.72) 0.723 �1.65 (�1.89 to �1.45) 0.086

Homo sapiens FeF 0.040 (0.018e0.072) 0.010 3.25 (1.65e5.07) 0.641 �1.65 (�1.95 to �1.45) 0.089MeM 0.037 (0.014e0.075) 0.009 2.99 (0.827e5.50) 0.654 �1.69 (�2.24 to �1.43) 0.097FeM 0.040 (0.017e0.082) 0.010 3.21 (1.52e5.48) 0.645 �1.68 (�2.00 to �1.42) 0.090

Pan troglodytes FeF 0.036 (0.021e0.056) 0.007 3.45 (2.14e5.68) 0.570 �1.62 (�1.83 to �1.37) 0.077MeM 0.037 (0.021e0.057) 0.007 3.70 (2.37e5.45) 0.536 �1.58 (�1.78 to �1.41) 0.063FeM 0.042 (0.023e0.067) 0.009 4.14 (2.30e6.38) 0.794 �1.56 (�1.81 to �1.34) 0.089

Gorilla gorilla FeF 0.035 (0.019e0.062) 0.008 3.96 (2.21e6.61) 0.792 �1.56 (�1.82 to �1.33) 0.085MeM 0.045 (0.021e0.076) 0.011 5.11 (2.57e8.35) 1.20 �1.45 (�1.77 to �1.18) 0.113FeM 0.056 (0.024e0.088) 0.012 6.44 (2.86e10.4) 1.51 �1.46 (�1.75 to �1.27) 0.092

Pongo pygmaeus FeF 0.052 (0.024e0.102) 0.016 5.04 (2.59e10.3) 1.51 �1.48 (�1.75 to �1.18) 0.117MeM 0.067 (0.030e0.128) 0.020 6.33 (2.72e13.4) 1.98 �1.40 (�1.71 to �1.09) 0.117FeM 0.065 (0.032e0.127) 0.016 6.66 (3.03e13.5) 1.84 �1.40 (�1.68 to �1.13) 0.097

Mixed genera FeF 0.107 (0.037e0.169) 0.030 11.9 (4.37e23.0) 4.14 �1.18 (�1.76 to �0.840) 0.222MeM 0.126 (0.035e0.199) 0.032 14.3 (4.44e24.6) 4.10 �1.10 (�1.78 to �0.800) 0.233FeM 0.117 (0.031e0.199) 0.031 13.4 (3.48e24.6) 4.21 �1.13 (�1.74 to �0.800) 0.231

Means, ranges, and standard deviations are reported for each dissimilarity metric. Comparisons are summarized within each combination of sexes (i.e., comparisons betweenpairs of females, pairs of males, or one female and one male) and within each species. Comparisons between crania belonging to different species are summarized within eachcombination of sexes (last three rows of the table).

Figure 5. Summary of log sem (a), STET (b), and sLR (c) values for all pairwise cranial comparisons within and between species and sexes in the broad-scale comparative sample, aswell as Y-intercepts for ordinary least squares regressions of raw data (d). Abbreviations: FeF, comparisons between pairs of female crania; MeM, comparisons between pairs ofmale crania; FeM, comparisons between one female and one male cranium. Box plots indicate range of values following R defaults (vertical lines in boxes indicate medians, boxesshow interquartile range, whiskers extend to the most extreme value that falls within 150% of the interquartile range, and circles are values that fall outside of that range). Speciesare arranged from top to bottom as mean sLR values for female-male comparisons increases. All comparisons between two crania belonging to different species (and thus differentgenera in this sample) are grouped together at the bottom of the figure. Vertical dashed line indicates the value of log sem, STET, or sLR that is greater than or equal to 95% ofintraspecific comparisons. Note that Y-intercepts can be quite different from zero, in which case the regression lines used by log sem and STET do not describe shape similarity.

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e13 7

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating the use of pairwise dissimilarity metrics in paleoanthropology, Journal ofHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

Page 8: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Table 4Summary of dissimilarity values calculated for all unique pairs in the intrageneric comparative crania data set.

Taxon Sexes sLR STET log sem

Mean Range S.D. Mean Range S.D. Mean Range S.D.

Pan paniscus FeF 0.042 (0.022e0.071) 0.011 3.63 (2.22e5.92) 0.779 �1.61 (�1.81 to �1.36) 0.087MeM 0.040 (0.020e0.076) 0.012 3.48 (1.99e5.60) 0.742 �1.62 (�1.84 to �1.41) 0.095FeM 0.041 (0.019e0.077) 0.011 3.57 (2.17e6.05) 0.754 �1.61 (�1.84 to �1.32) 0.094

Pan troglodytes FeF 0.043 (0.025e0.071) 0.009 3.57 (2.03e6.35) 0.795 �1.62 (�1.86 to �1.35) 0.100MeM 0.043 (0.022e0.070) 0.010 3.69 (2.11e6.07) 0.898 �1.60 (�1.85 to �1.35) 0.109FeM 0.045 (0.023e0.081) 0.011 3.88 (2.07e7.28) 1.03 �1.60 (�1.86 to �1.28) 0.116

Mixed species FeF 0.055 (0.023e0.094) 0.013 5.32 (2.49e8.43) 1.19 �1.49 (�1.82 to �1.28) 0.096MeM 0.060 (0.033e0.091) 0.013 5.68 (3.28e9.09) 1.16 �1.48 (�1.69 to �1.32) 0.081FeM 0.058 (0.029e0.096) 0.013 5.54 (2.38e9.32) 1.23 �1.48 (�1.81 to �1.27) 0.089

Means, ranges, and standard deviations are reported for each dissimilarity metric.

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e138

a test of the null hypothesis of conspecificity. More accurately, it is atest to determine the probability that the level of shape dissimi-larity observed in one pair of crania could be sampled fromrandomly drawnpairs of crania known to be conspecific. For sLR andSTET, the 5132 values for all unique pairs of conspecific crania wereordered from smallest to largest. For log sem, 10,264 values werecalculated because the value of log sem differs depending on whichcranium is placed on the Y-axis. The value that was greater than orequal to 95% of all conspecific values was identified for eachdissimilarity index. Under an exact randomization test, dissimi-larity values between pairs of test crania that are larger than thisvalue would have p-values less than 0.05 for tests of the null hy-pothesis. This threshold corresponding to alpha ¼ 0.05 for eachdissimilarity metric is plotted as a vertical dashed line in Fig. 5aec.

Dissimilarity metrics calculated for pairs of crania drawn fromdifferent genera were then compared against the threshold values.For sLR, 10.1% of comparisons between non-congeners have lowersLR values (i.e., are more similar in shape) than the highest 5.0% ofconspecific sLR values (Fig. 5c). If we accept for the moment theinterpretation that this is a test of the null hypothesis of con-specificity, then these results show that a significance test withalpha ¼ 0.05 for comparison of crania that belong to differentgenera would fail to reject the null hypothesis that they fall withinthe same species 10.1% of the time (i.e., high Type II error). Com-parisons between non-congeners would fail to reject the null hy-pothesis that they come from the same species for 13.1% of cranial

Figure 6. Summary of log sem (a), STET (b), and sLR (c) values for all pairwise cranial compariintercepts for ordinary least squares regressions of raw data (d). Abbreviations and box plot dis greater than or equal to 95% of intraspecific comparisons. Approximately half or more ofdissimilarity values than that threshold.

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

pairs using STET (Fig. 5b), and for 24.9% of cranial pairs usinglog sem (Fig. 5c). The Type II error rate for log sem is nearly twice thatof STET because of the great disparity in the two possible log semvalues for many comparisons between non-congeners (as the valuedepends on which cranium is on the Y-axis). In cases of suchdisparity, the STET value effectively averages the two log sem valuesand is generally higher than the STET threshold value, while one ofthe two log sem values is often below the log sem threshold.Furthermore, it is clear that the Y-intercept of OLS ordinary leastsquares regressions of the raw data is rarely equal to zero (Fig. 5d).As explained and illustrated in Fig. 2, regressions of raw data mustpass through the origin in order to describe isometry. Therefore,log sem and STET are usually not measuring deviations from shapesimilarity, regardless of whether the two crania being compared areconspecific or not.

It should also be noted that the highest sLR values for non-congeners tend to be for comparisons between pairs of craniathat obviously belong to different species, such as comparisonsbetween orangutans and black-and-white colobus monkeys.Therefore, it is likely that the percentage of Type II errors would beconsiderably higher for sets of cranial pairs that do belong toseparate species, but which are similar enough that they couldplausibly be thought to potentially belong to the same species (andthus would be likely to be compared using one of these methods).Not coincidentally, these are precisely the types of comparisons inwhich these methods have been employed in analyses of the

sons within and between species and sexes in the Pan comparative sample, as well as Y-etails are as in Fig. 5. Vertical dashed line indicates the value of log sem, STET, or sLR thatcomparisons between one bonobo cranium and one chimpanzee cranium have lower

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 9: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e13 9

hominin fossil record. In order to address this question, we per-formed the same type of analysis using a sample of bonobo andcommon chimpanzee cranial measurements (Table 4, Fig. 6). Asexpected, many comparisons between one bonobo cranium andone common chimpanzee cranium resulted in dissimilarity valueswell within the range of conspecific comparisons, and a large per-centage of interspecific comparisons were not more dissimilar than95% of conspecific comparisons: 53.5% of log sem values (Fig. 6a),46.3% of STET values (Fig. 6b), and 69.3% of sLR values (Fig. 6c) fellbelow the 95% threshold. If one believes these to be ‘tests of the nullhypothesis of conspecificity,’ then they fail to reject that null hy-pothesis nearly half the time or more when comparing crania thatactually belong to different species. Even if these types of analysescould be argued to test such a null hypothesis, a position we do notendorse (see below), it is important to remember that failure toreject a null hypothesis is quite different from demonstrating thenull hypothesis to be true.

Although proponents of these methods discuss significancetests in terms of measuring the probability of conspecificity for apair of crania, this is not an accurate characterization of these tests.More accurately, for a given set of measurements, they calculate theprobability of randomly selecting from a particular sample ofconspecifics a pair of crania that shows as much, or more, deviationfrom shape similarity as the pair of test crania. However, for anyparticular set of measurements some species will be more variablein shape than others (see Table 3), meaning that there is no singlevalue that is shared among closely related taxa that one can point toas a standard amount of shape dissimilarity that is present within aspecies, let alone a biological constant for all vertebrates (contraThackeray, 2007). Furthermore, the variability in shape present in aparticular set of measurements for any given analysis may havelittle or nothing to do with the shape variation relevant to speciesdifferences, in that some measurements will exhibit high intra-specific variation relative to interspecific variation (e.g., Wood et al.,1991). In the absence of a priori knowledge that a particular set ofinterlandmark measurements is more variable between speciesthan within species for the set of taxa under consideration, it isdifficult e if not impossible e to justify the interpretation that asignificance test as performed above constitutes a test that informson conspecificity as opposed to simply shape similarity for a spe-cific set of measurements. It is worth re-emphasizing that even if amethod reliably compares the shape differences betweenmembersof the same, or different taxa, those shape differences are a post-hoc consequence of alpha taxonomic decisions that were basedon amuchwider range of evidence. Shape differences are emergentproperties of taxonomic decisions. Thus, even if a method iseffective at quantifying shape differences, it is logically fallacious toconclude that the method can be converted into a tool for makingreliable taxonomic decisions.

Conclusions

We have demonstrated that because of the regression modelused by previous investigators, neither log sem nor STET reliablymeasure dissimilarity from shape equivalence. Furthermore, thevalue calculated for either of these metrics for any particular pair ofcrania has been shown to be highly dependent on the number of

1 Another recent instance of the lack of theoretical validation is in the applicationof yet another slightly different metric, the Average Index of Relative Difference,introduced by Van Arsdale and Wolpoff (2013) to test the single lineage hypothesisin Homo. That metric purports to measure size variation between pairs of crania,but it can be shown that two isometrically scaled crania of different sizes wouldproduce the same non-zero value for that metric as two very differently-shapedcrania of equal size.

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

variables involved in the comparison (as shown in the trends acrossdifferent values of k in Fig. 3), and on the specific set of variablesconsidered (as shown by the high variation in log sem and STET forany particular value of k in Fig. 3). We offer the general caution thatany new metric should first be shown from theoretical andempirical perspectives to measure what it purports to measure.1

We have done so with a new metric (sLR) that more reliably mea-sures deviations from shape equivalence for pairs of objects asmeasured by deviations from isometric similarity, and we suggestthat the sLR should be used instead of log sem or STET if the goal is toquantify morphological dissimilarity between two fossils as rep-resented by overlapping sets of interlandmark distances. However,even when one is able to accurately measure shape dissimilarity,this rarely, if ever, translates into being able to reliably determinewhether pairs of crania are conspecific.

With regard to the ability of any dissimilarity metric to deter-mine whether pairs of crania are conspecific, it is not altogethersurprising that significance tests based on these metrics perform sopoorly. Although central tendencies of shape often differ signifi-cantly among species, many morphological analyses of closely-related primate taxa show overlap between specimens of differentspecies regardless of whether the morphology in question is dental,cranial, or postcranial (e.g., Pan, 2006; Richmond and Jungers, 2008;Baab andMcNulty, 2009) . For example, in a study of cranial shape infossil hominins we found excellent distinction between ‘grades’ ofhominins (gracile and robust australopiths, earlier Homo, and laterHomo), but found that Homo habilis sensu lato specimens do notcluster together as a unit distinct from all Homo erectus s.l. crania.Instead, some H. habilis s.l. crania are more similar in shape to someH. erectus s.l. crania than to other ‘habilines,’ and some H. erectus s.l.crania are more similar in shape to some ‘habilines’ than to other‘erectines’ (Gordon et al., 2008). This overlap is precisely why pair-based methods might be expected to perform poorly, as illustratedhere by the analysis of Pan paniscus and P. troglodytes.

Not surprisingly, published studies employing log sem or STEToften find no significant difference between pairs of fossil hominincrania (e.g., Wolpoff and Lee, 2001, 2006; Lee and Wolpoff, 2005,2007). For example, when log sem was used to compare theAu. Australopithecus craniumMH1 against crania from Au. africanusand Homo habilis, the ‘null hypothesis of conspecificity’ was notrejected in either case (Thackeray, 2010). These results do notindicate that all such comparisons are actually occurring betweenconspecifics, but rather that pairwise techniques are ill-suited toaddress questions of alpha taxonomy. Furthermore, in a compari-son of log semwith three other pairwise techniques used to attemptto determine conspecificity in pairs of hominin crania, Aiello andcolleagues found that each of the four methods identified differentsets of cranial pairs that were most likely to be conspecific, anddifferent sets of cranial pairs that were most likely to belong todifferent species (Aiello et al., 2000), further confirming the abys-mally poor performance of pairwise techniques in this arena.

When addressing questions of alpha taxonomy, rather thanusing pairwise comparison techniques that are particularly sus-ceptible to overlap in shape space between species (even if thosetechniques accurately measure dissimilarity), it is more appropriateto use techniques that allow for the consideration of multiple craniaat the same time. For example, hierarchical techniques such ascluster analysis (CA) allow for the grouping of shape-similar spec-imens independent of any a priori knowledge of taxonomy for anyspecimen, and discriminant function analysis (DFA) allows for thecomparison of unknown specimens with groups of known speci-mens based on those traits that exhibit the greatest between-groupvariability relative to within-group variability. It is also importantto note that regression-based dissimilarity metrics cannot take intoaccount non-metric traits, whereas techniques such as CA and DFA

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 10: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e1310

can by including such traits using binary dummy variables. Theselatter techniques are not without shortcomings, most notably theneed for all variables to be represented for all specimens in ananalysis e a tall order for fossil samples, and probably one of theprimary reasons that the pairwise methods evaluated here weredeveloped in the first place. However, methods such as CA and DFAcan and should be extended to address the missing data problem sothat assessment of taxonomic identity for fossil crania of unknownspecies can be based on comparison with groups of fossils ratherthan individual specimens.

Does this mean that pairwise dissimilarity metrics based oncomparisons of paired landmarks should not be used under anycircumstances? Despite the fact that publications that use thesetypes of methods to assess conspecificity continue to be added tothe literature (e.g., Thackeray and Odes, 2013; Van Arsdale andWolpoff, 2013), the results of this study indicate that none of themethods discussed above are generally appropriate for answeringthe question of whether two fossil hominin crania are likely to beconspecific or not. A possible exception might be made in the raresituation in which researchers have compelling a priori reasons tobelieve that the particular set of measurements to be comparedbetween two fossil crania should be much more variable betweenclosely-related species than within species, and they also havecompelling reasons to believe that their comparative sample pre-serves a level of intraspecific variation equal to that expected intheir fossil species. That said, it is difficult to imagine howone couldknow the level of intraspecific variability to expect for a speciescurrently represented by a single specimen, for by definition thisstate of affairs precludes the possibility of estimating intraspecificvariation. In any event, taxonomic decisions are not mechanisticand should be informed by a range of morphological evidencethat may include, but which is by no means restricted to,morphometric data.

Of course, we can envision situations unrelated to taxonomicquestions where it would be useful to have a metric that accuratelymeasuresdeviations fromshapesimilarity forapairof specimensanda specific set of measurements. Many questions can only beaddressed through the comparison of pairwise distance matrices,such as questions relating to differences associated with physicaldistance between sites, number of biogeographic barriers betweensites, or some other measure. For example, one could imagineinvestigating whether particular differences in the shape of theneurocranium between specimens are correlated with theirgeographic distribution. This is a question that would typically beaddressed with a Mantel test to compare two pairwise distancematrices: one morphological and one geographical. In this example,oneof severalpossibleways thatonecouldconstruct amorphologicalpairwise distance matrix would be to use sLR or another metric thataccurately measures deviation from isometric similarity for aparticular set of measurements related to the shape of the neuro-cranium that is shared by all of the specimens in the analysis. In suchscenarios, it is not the taxonomic identity of specimens that is underinvestigation, but rather the relationship of differences in shape todifferences in some other variable. Under these conditions, a shapedissimilaritymetric such as sLR (rather than log sem or STET) could beparticularly useful. But on the basis of the work presented above, webelieve that none of these types of dissimilarity metrics should beused to assess the probability of conspecificity in pairs of fossils.

Acknowledgments

Work on this manuscript was made possible by a Wenner-GrenHunt Postdoctoral Fellowship to ADG and support from the Collegeof Arts and Sciences at the University at Albany. BW was supportedby the GW Provost via Signature Program funding. This manuscript

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

also benefited from comments on earlier versions from David Polly,Fred Bookstein, and anonymous reviewers.

Appendix A. Verbal description of sLR

By definition, two sets of homologous measurements from ob-jects that are identical in shape but not necessarily the same sizewill scale isometrically with each other. It can easily be shown thatsuch shape similarity, where y ¼ mx when original linear mea-surements (i.e., not log-transformed) are plotted against each other,will result in measurements falling on a line with a slope equal toone when logged data are plotted against each other (Jungers,1985):

log y ¼ log mþ log x

In log space, logm is the intercept of the line and it correspondsto the logarithm of the scaling coefficient m, which is the propor-tional size difference between the two specimens. Therefore, analternative method for measuring shape dissimilarity betweenpairs of specimens can be developed based on the deviation ofspecimen measurements from a theoretical line of isometry in logspace.

Logarithmically-transforming data also removes the undue in-fluence that larger measurements have on regressions of raw data(i.e., leverage due to heteroscedasticity). Differences in loggedmeasurements represent proportional differences in raw data, notabsolute differences, and thus all variables that have the same levelof relative variation (such as measured by coefficients of variation)will have equal variability when logged, regardless of the differenceof scale for the raw variables. For example, a 20% difference ininterorbital breadth would be treated exactly the same as a 20%difference in cranial length in terms of its contribution to deviationfrom shape similarity in log space. When considering how datapoints might deviate from isometry, there is no reason to believethat larger measurements or smaller measurements as a groupshould be scaling differently from shape similarity, but rather thatmeasurements associated with particular regions of the cranium(e.g., measurements related to midfacial projection) may deviatefrom isometry. However, since both relatively large and relativelysmall measurements may be collected from any given region, thesedeviations from isometry will be scattered throughout the regres-sion of measurements from one cranium on measurements fromanother cranium (i.e., some deviations from isometry will be at thelow end of the X-axis and others will be at the high end of the X-axis) and thus will not have a predictable effect on the slope.Therefore, rather than testing to see if a regression slope differsfrom isometry, it is more appropriate to ask how much the pairedmeasurements themselves deviate from an isometric scaling rela-tionship; i.e., the standard deviation of the residuals about aregression line with the slope constrained to one.

If overall cranial size for each cranium is calculated as the geo-metric mean of all measurements following Jungers et al. (1995), itcan be shown that the best fit estimator of an isometric scaling linefor two crania is a line with a slope equal to one that has a Y-intercept equal to the log of the ratio of the cranial size of the Y-axisspecimen to the cranial size of the X-axis specimen (Appendix B,Equation (B.14)). Residuals from this line are equal to the log ratio ofMosimann shape variables (Mosimann,1970) for the Yand X crania;i.e., each residual quantifies the proportional deviation fromisometry for that particular variable (Appendix B, Equation (B.22)).These residuals have the additional desirable property that theyrepresent proportional deviations from ratios (Smith, 1999). This isdesirable because equal proportional differences in measurementswill produce identical residuals in log space, regardless of whether

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 11: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e13 11

the original measurements are on the order of microns or meters. Itis relative not absolute variability that is considered. Equal weight isgiven to proportional variation, regardless of whether that varia-tion occurs in large measurements like maximum cranial length, orin small measurements such as interorbital breadth.

Residuals from the isometric scaling line in log space may besummarized by their standard deviation. When the standarddeviation of residuals is relatively high, there is more deviationfrom shape similarity for the comparison of the two crania. Whenthe standard deviation is relatively low, there is little deviationfrom shape similarity. When the standard deviation is equal tozero, then all measurements fall exactly on the line and thus thetwo crania are identical in shape with respect to the measure-ments under consideration. It can further be shown that thisstandard deviation of residuals is the same regardless of whichcranium is placed on the Y-axis (Appendix B, Equation (B.24)).Finally, it can be demonstrated that the standard deviation ofresiduals from the regression of logged measurements againsteach other is mathematically equivalent to the standard deviationof the logged ratios of the original measurements (Appendix B,Equation (B.23)). Therefore, we refer to this measure of dissimi-larity between two objects as the Standard Deviation of LoggedRatios (sLR), calculated as shown in Appendix B, Equation (B.25).As noted in the main text, sLR can be calculated using any base forthe logarithm, but we use base 10 logarithms to facilitate com-parison with log sem.

Appendix B. Mathematical derivation of sLR

For any two crania that are identical in shape but not necessarilyidentical in size, a set of homologous measurements must all scaleidentically for the two crania such that:

y ¼ mx (B.1)

where x is the vector of measurements from cranium 1, y is thevector of measurements from cranium 2, and m is a scaling con-stant describing the size of cranium 2 relative to cranium 1. Thisrelationship between paired measurements for two identically-shaped crania may also be expressed in terms of logged data, asfollows:

log y ¼ logðmxÞ (B.2)

log y ¼ log mþ log x (B.3)

Substituting X for log x, Y for log y, and b for log m into Equation(B.3) yields:

Y ¼ bþ X (B.4)

With the addition of an error term ( 3), Equation (B.4) is identical toan ordinary least squares regression in which the slope has beenconstrained to equal one:

Y ¼ bþ X þ 3 (B.5)

Thus if two shapes are identical, all log-transformed data pointswill fall precisely on a regression line with an isometric slope (theisometric scaling slope is equal to one because all variables aremeasured in the same dimension, i.e., all linear measurements), butmost pairs of crania will not be exactly the same shape. Deviationsfrom shared identical shape between two crania may be measuredby residuals from an isometric scaling relationship. In this case, theregression model is fit by estimating only the intercept, not the

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

slope (which is set equal to one as shown in Equation (B.5)). Toderive the estimator of the intercept, we minimize the sum ofsquared residuals from the regression line. Equation (B.5) can berewritten as follows:

3¼ Y � X � b (B.6)

The sumof squared residuals are calculated as shown inequation (B.7),Xni¼1

32i ¼

Xni¼1

ðYi � Xi � bÞ2 (B.7)

where n is the number of pairs of measurements used in theregression. Equation (B.7) can be transformed as follows:

Xni¼1

32i ¼

Xni¼1

�Y2i þ X2

i þ b2 � 2XiYi � 2bYi þ 2bXi

�(B.8)

Xni¼1

32i ¼

Xni¼1

�Y2i þ X2

i � 2XiYi�þ

Xni¼1

b2 �Xni¼1

2bðYi � XiÞ

(B.9)

Xni¼1

32i ¼

Xni¼1

ðYi � XiÞ2 þ nb2 � 2bXni¼1

ðYi � XiÞ (B.10)

Taking the first derivative of equation (B.10) with respect to b andsetting it equal to zero to get the estimator for b yields thefollowing:

0 ¼ 0þ 2nbb � 2Xni¼1

ðYi � XiÞ (B.11)

nbb ¼Xni¼1

ðYi � XiÞ (B.12)

bb ¼ 1n

Xni¼1

Yi �1n

Xni¼1

Xi (B.13)

bb ¼ Y � X (B.14)

As shown in equation (B.14), the estimator for the intercept is equalto themean of logged values from cranium 2minus themean of thelogged values from cranium 1. We can further investigate thisrelationship by converting back to original variables:

dlog m ¼ logðGM½y�Þ � logðGM½x�Þ (B.15)

dlog m ¼ log�GM½y�GM½x�

�(B.16)

bm ¼ GM½y�GM½x� (B.17)

where GM is the geometric mean. Equation (B.17) states that thescaling constant between the two crania for the non-log-transformed measurements is equal to the ratio of the geometricmean of the measurements of cranium 2 to the geometric mean ofthe measurements of cranium 1; i.e., the ratio of cranium 2 size tocranium 1 size where size is represented as the geometric mean ofall measurements for a given cranium (Mosimann, 1970; Jungerset al., 1995).

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 12: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e1312

Furthermore, residuals from a regression of logged data that isconstrained to an isometric slope can be expressed in terms ofMosimann shape variables. Replacing b in equation (B.6) yields:

3¼ Y � X � �Y � X

�(B.18)

Equation (B.18) can be reworked as follows:

3¼ �Y � Y

�� �X � X

�(B.19)

3¼ ½log y� logðGM½y�Þ� � ½log x� logðGM½x�Þ� (B.20)

3¼ log�

yGM½y�

�� log

�x

GM½x��

(B.21)

3¼ log�

yGM½y�=

xGM½x�

�(B.22)

Equation (B.22) states that the residuals from a regression of loggedcranial measurements with the slope set equal to one are equal tothe log of the ratios of size-adjusted shape variables (Mosimannshape variables) for cranium 2 to size-adjusted shape variables forcranium 1. The mean of the residuals (and thus the mean of thelogged shape ratios) will always be zero. However, the more a set ofmeasurements differs from shape similarity between two crania,the higher the standard deviation of the residuals.

It follows from the above that if one wants to measure morpho-logical dissimilarity between two crania (or any other objects fromwhich paired measurements can be collected), one can use the stan-darddeviationof the residuals of loggeddata froman isometric scalingline with an intercept equal to the logged ratio of geometric means ofthe original measurements. This metric is equal to the standard devi-ation of the logged ratios of shape variables in cranium2 to cranium1,which can be thought of as an average measure of proportional devi-ation from shape similarity between two crania across a set of mea-surements. It can also be demonstrated that the standard deviation ofthe loggedratiosof shapevariables isequal to thestandarddeviationofthe logged ratios of original size variables, i.e.:

VARð 3Þ ¼ VAR�Y � Y

�� �X � X

� ¼ VAR½Y � X� (B.23)

It should also be noted that:

VAR½Y � X� ¼ VAR½X � Y � (B.24)

which is to say that the standard deviation is the same regardless ofwhether a cranium is placed on the X-axis or on the Y-axis.

The practical implications of Equations (B.23) and (B.24) are that itis not necessary to calculate residuals from the regression, but onemay get the same value by simply calculating the logged ratio foreverypairofmeasuredvaluespresent in the twocraniaand taking thestandard deviation, and that this value does not depend on whichcranium is selected for the numerator versus the denominator.

Because of the equivalence of the standard deviation of residualsfrom the regression and the standard deviation of logged ratios ofthe original measurements, we hereafter refer to this dissimilaritymetric as the Standard Deviation of Logged Ratios, which weabbreviate as sLR, and which is calculated as:

sLR ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVAR½Y � X�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVAR

hlog

�yx

�ir(B.25)

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

Appendix C. Supplementary material

Supplementarymaterial associatedwith this article canbe found inthe online version at http://dx.doi.org/10.1016/j.jhevol.2013.08.002.

References

Aiello, L.C., Collard, M., Thackeray, J.F., Wood, B.A., 2000. Assessing exactrandomization-based methods for determining the taxonomic significance ofvariability in the human fossil record. S. Afr. J. Sci. 96, 179e183.

Baab, K.L., McNulty, K.P., 2009. Size, shape, and asymmetry in fossil hominins: thestatus of the LB1 cranium based on 3D morphometric analyses. J. Hum. Evol. 57,608e622.

Brown, P., Maeda, T., 2009. Liang Bua Homo floresiensis mandibles and mandibularteeth: a contribution to the comparative morphology of a new hominin species.J. Hum. Evol. 57, 571e596.

Cofran, Z., Thackeray, J.F., 2010. One or two species? A morphometry comparisonbetween robust australopithecines from Kromdraai and Swartkrans. S. Afr. J. Sci.106, 40e43.

Darwin, C., 1859. On the Origin of Species by Means of Natural Selection, orthe Preservation of Favoured Races in the Struggle for Life. John Murray,London.

Gordon, A.D., Nevell, L., Wood, B., 2008. The Homo floresiensis cranium (LB1): size,scaling, and early Homo affinities. Proc. Natl. Acad. Sci. 105, 4650e4655.

Houghton, K., Thackeray, J.F., 2011. Morphometric comparisons between crania ofLate Pleistocene Homo sapiens from Border Cave (BC 1), Tuinplaas (TP 1) andmodern southern African populations. Trans. R. Soc. S. Afr. 66, 2011.

Howells, W.W., 1973. Cranial Variation in Man. A Study by Multivariate Analysis ofPatterns of Differences among Recent Human Populations. Peabody Museum,Cambridge.

Howells, W.W., 1989. Skull Shapes and the Map. Craniometric Analyses in theDispersion of Modern Homo. Peabody Museum, Cambridge.

Howells, W.W., 1996. Howells’ craniometric data on the internet. Am. J. Phys.Anthropol. 101, 441e442.

Jungers, W.L., 1985. Body size and scaling of limb proportions in primates. In:Jungers, W.L. (Ed.), Size and Scaling in Primate Biology. Plenum Press, New Yorkand London, pp. 345e381.

Jungers, W.L., Falsetti, A.B., Wall, C.E., 1995. Shape, relative size, and size-adjustments in morphometrics. Yearb. Phys. Anthropol. 38, 137e161.

Lee, S.-H., 2011. How many variables are too few? Effect of sample size in STET, amethod to test conspecificity for pairs of unknown species. PaleoAnthropology2011, 260e267.

Lee, S.-H., Wolpoff, M.H., 2005. Habiline variation: a new approach using STET.Theory Biosci. 124, 25e40.

Lee, S.-H., Wolpoff, M.H., 2007. Herto and the Neandertals: what can a 160,000-year-old African tell us about European Neandertal evolution? In:Sankhyan, A.R., Rao, V.R. (Eds.), Human Origins, Genome and People of India:Genomic, Palaeontological and Archaeological Perspectives. Allied Publishers,New Delhi (India), pp. 329e336.

Lele, S.R., Richtsmeier, J.T., 2001. An Invariant Approach to Statistical Analysis ofShapes. Chapman and Hall/CRC, London.

Mosimann, J.E., 1970. Size allometry: size and shape variables with characteriza-tions of the lognormal and generalized gamma distributions. J. Am. Statist.Assoc. 65, 930e945.

Pan, R., 2006. Dental morphometric variation between African and Asian colobines,with special reference to the other Old World monkeys. J. Morphol. 267, 1087e1098.

Pilbrow, V., 2006. Population systematics of chimpanzees using molar morpho-metrics. J. Hum. Evol. 51, 646e662.

Pilbrow, V., 2010. Dental and phylogeographic patterns of variation in gorillas.J. Hum. Evol. 59, 16e34.

R Development Core Team, 2012. R: a Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna.

Richmond, B.G., Jungers, W.L., 2008. Orrorin tugenensis femoral morphology and theevolution of hominin bipedalism. Science 319, 1662e1665.

Slice, D.E. (Ed.), 2005. Modern Morphometrics in Physical Anthropology. KluwerAcademic/Plenum Publishers, New York.

Smith, R.J., 1999. Statistics of sexual size dimorphism. J. Hum. Evol. 36, 423e459.Smith, R.J., 2005. Species recognition in paleoanthropology: implications of small

sample sizes. In: Lieberman, D.E., Smith, R.J., Kelley, J. (Eds.), Interpreting thePast: Essays on Human, Primate, and Mammal Evolution in Honor of DavidPilbeam. Brill Academic Publishers, Boston, pp. 207e219.

Thackeray, J.F., 1997. Probabilities of conspecificity. Nature 390, 30e31.Thackeray, J.F., 2007. Approximation of a biological species constant? S. Afr. J. Sci.

103, 489.Thackeray, J.F., 2010. Comparisons between Australopithecus sediba (MH1) and other

hominin taxa, in the context of probabilities of conspecificity. S. Afr. J. Sci. 106,1e2.

Thackeray, J.F., Odes, E., 2013. Morphometric analysis of Early Pleistocene Africanhominin crania in the context of a statistical (probabilistic) definition of aspecies. Antiquity 87. http://antiquity.ac.uk/projgall/thackeray335/.

Thackeray, J.F., Prat, S., 2009. Chimpanzee subspecies and ‘robust’ australopithecineholotypes, in the context of comments by Darwin. S. Afr. J. Sci. 105, 463e464.

use of pairwise dissimilarity metrics in paleoanthropology, Journal of

Page 13: Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

A.D. Gordon, B. Wood / Journal of Human Evolution xxx (2013) 1e13 13

Thackeray, J.F., Bellamy, C.L., Bellars, D., Bronner, G., Bronner, L., Chimimba, C.,Fourie, H., Kemp, A., Kruger, M., Plug, I., Prinsloo, S., Toms, R., van Zyl, A.J.,Whiting, M.J., 1997. Probabilities of conspecificity: application of a morpho-metric technique to modern taxa and fossil specimens attributed to Austral-opithecus and Homo. S. Afr. J. Sci. 93, 195e196.

Thackeray, J.F., Maureille, B., Vandermeersch, B., Braga, J., Chaix, R., 2005.Morphometric comparisons between Neanderthals and ‘anatomically modern’Homo sapiens from Europe and the Near East. Annls. Transv. Mus. 42, 47e51.

Van Arsdale, A.P., Wolpoff, M.H., 2013. A single lineage in early Pleistocene Homo:size variation continuity in early Pleistocene Homo crania from East Africa andGeorgia. Evolution 67, 841e850.

Please cite this article in press as: Gordon, A.D., Wood, B., Evaluating theHuman Evolution (2013), http://dx.doi.org/10.1016/j.jhevol.2013.08.002

Wolpoff, M.H., Lee, S.-H., 2001. The Late Pleistocene human species of Israel. Bull.Mém. Soc. Anthropol. Paris 13, 291e310.

Wolpoff, M.H., Lee, S.-H., 2006. Variation in the habiline crania e must it be taxo-nomic? Hum. Evol. 21, 71e84.

Wood, B.A., 1975. An analysis of sexual dimorphism in primates. Ph.D. Dissertation.University of London.

Wood, B.A., 1976. The nature and basis of sexual dimorphism in the primate skel-eton. J. Zool. 180, 15e34.

Wood, B.A., Li, Y., Willoughby, C., 1991. Intraspecific variation and sexual dimor-phism in cranial and dental variables among higher primates and their bearingon the hominid fossil record. J. Anat. 174, 185e205.

use of pairwise dissimilarity metrics in paleoanthropology, Journal of