atkinsongray2006a how old is ie lang fam
TRANSCRIPT
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
1/20
91
HowOldistheIndoEuropeanLanguageFamily?
Chapter8
HowOldistheIndoEuropeanLanguageFamily?IlluminationorMoreMothstotheFlame?
QuentinD.Atkinson&RussellD.Gray
European(thehypothesizedancestralIndoEuropeantongue)withtheKurgancultureofsouthernRussiaandtheUkraine.TheKurganswereagroupofseminomadic,pastoralist,warriorhorsemenwhoexpandedfromtheirhomelandintheRussiansteppesduring
thefiPhandsixthmillennia,conqueringDanubianEurope,CentralAsiaandIndia,andlatertheBalkansandAnatolia.Thisexpansionis thoughttoroughlymatchtheacceptedancestralrangeofIndoEuropean(Trask1996).As well as the apparent geographicalcongruencebetweenKurganandIndoEuropeanterritories,thereislinguisticevidenceforanassociation
betweenthetwocultures.WordsforsupposedKurgantechnologicalinnovationsarenotablyconsistentacrosswidelydivergentIndoEuropeansubfamilies.Theseincludetermsforwheel(*rotho,*kW(e)kWlo),axle(*akslo),yoke(*jugo),horse(*ekwo)andtogo,transportinavehicle( *wegh:Mallory1989;Campbell2004).Itisarguedthatthesewordsandas
sociatedtechnologiesmusthavebeenpresentintherotoIndoEuropeancultureandthattheywerelikelytohavebeenKurganinorigin.Hence,theargumentgoes,theIndoEuropeanlanguagefamilyisnoolderthan50006000.Mallory(1989)arguesforasimilartimeandplaceofIndoEuropeanoriginaregionaroundtheBlackSeaabout50006000 (althoughhe is more cautious and refrains from identifyingrotoIndoEuropeanwithaspecificculturesuchastheKurgans).
The second theory, proposed by archaeologistColin Renfrew (1987), holds that IndoEuropeanlanguagesspread,notwithmaraudingRussianhorsemen,butwiththeexpansionofagriculturefromAna
toliabetween8000and9500yearsago.RadiocarbonanalysisoftheearliestNeolithicsitesacrossEuropeprovidesafairlydetailedchronologyofagriculturaldispersal.ThisarchaeologicalevidenceindicatesthatagriculturespreadfromAnatolia,arrivinginGreeceatsometimeduringtheninthmillenniumandreachingasfarastheBritishIslesby5500(Gkiastaetal.2003).Renfrewmaintainsthatthelinguisticargument
1.Anelectriclightonasummernight
TheoriginofIndoEuropeanhasrecentlybeendescribedasoneofthemostintensivelystudied,yetstillmostrecalcitrantproblemsofhistoricallinguistics
(Diamond&Bellwood2003,601).Despiteover200yearsofscrutiny,scholarshavebeenunabletolocatetheoriginofIndoEuropeandefinitivelyintimeorplace.Theorieshavebeenputforwardadvocatingagesrangingfrom4000to23,000years(Oe1997),withhypothesized homelandsincluding Central Europe(Devoto1962),theBalkans(Diakonov1984),andevenIndia(Kumar1999).Mallory(1989)acknowledges14distincthomelandhypothesessince1960alone.Herathercolourfullyremarksthat
thequestfortheoriginsoftheIndoEuropeanshasallthefascinationofanelectriclightintheopenaironasummernight:ittendstoaracteveryspeciesofscholarorwouldbesavantwhocantakepento
hand(Mallory1989,143).Unfortunately,archaeological,geneticandlinguisticresearchonIndoEuropeanoriginshassofarprovedinconclusive. Whilst numerous theories of IndoEuropean origin have been proposed, they haveprovendifficulttotest.Inthischapter,weoutlinehowtechniquesderivedfromevolutionarybiologycanbeadaptedtotestbetweencompetinghypothesesaboutthe age of the IndoEuropeanlanguage family. Wearguethatthesetechniquesareausefulsupplementtotraditional methodsin historicallinguistics.Thischapterisadevelopmentandextensionofpreviousworkontheapplicationofphylogeneticmethodstothestudy of language evolution (Gray &Atkinson
2003;Atkinson&Gray2006;Atkinsonetal.2005).
2.Twotheories
TherearecurrentlytwomaintheoriesabouttheoriginofIndoEuropean.Thefirsttheory,putforwardbyMaraGimbutas(1973a,b)onthe basisof linguisticand archaeological evidence, links rotoIndo
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
2/20
92
Chapter8
fortheKurgantheoryisbasedononlylimitedevidenceforafewenigmaticrotoIndoEuropeanwordforms.HepointsoutthatparallelsemanticshiPsorwidespreadborrowingcanproducesimilarwordforms
acrossdifferentlanguageswithoutrequiringthatanancestral term was present in the protolanguage.RenfrewalsochallengestheideathatKurgansocialstructureandtechnologywassufficientlyadvancedtoallowthemtoconquerwholecontinentsinatimewhenevensmallcitiesdidnotexist.Farmorecredible, heargues,isthat rotoIndoEuropean spreadwiththespreadofagricultureascenariothatisalsothoughttohaveoccurredacrosstheacific(Bellwood1991;1994),SoutheastAsia(Glover&Higham1996)andsubSaharanAfrica(Holden2002).Onthebasisoflinguisticevidence,Diakonov(1984)alsoarguesforanearlyIndoEuropeanspreadwithagriculturebutplacesthehomelandintheBalkansapositionthat
maybereconcilablewithRenfrewstheory.The debate about IndoEuropeanorigins thuscentresonarchaeologicalevidencefortwopopulationexpansions,bothimplyingverydifferenttimescales the Kurgan theory with a dateof 50006000 ,andtheAnatoliantheorywithadateof80009500.Onewayofpotentiallyresolvingthedebateistolookoutside the archaeological record for independentevidencethatallowsustotestbetweenthesetwotimedepths.Geneticstudiesofferonepotentialsourceofevidence.Unfortunately,duetoproblemsassociatedwithadmixture,slowratesofgeneticchangeandtherelativelyrecenttimescalesinvolved,geneticanalyseshavebeenunabletoresolvethedebate(CavalliSforza
etal.1994;Rosser etal.2000).Anotherpotentiallineofevidenceiscontainedinthelanguagesthemselves,anditisthelinguisticevidenceweshallturntonow.
3.ThedemiseofgloDochronologyandtheriseofcomputationalbiology
Languages, likegenes, chronicle their evolutionaryhistory.Languages,however,changemuchfasterthangenesandsocontainmoreinformationatshallowertimedepths.Conventionalmeansoflinguisticinquiry,likethecomparativemethod,areabletoinferancestralrelationshipsbetweenlanguagesbutcannotprovideanabsoluteestimateoftimedepth.Analternativeap
proachisMorrisSwadeshs(1952;1955)lexicostatisticsanditsderivativegloochronology.Thesemethodsuselexicaldatatodeterminelanguagerelationshipsand to estimate absolute divergence times. Lexicostatisticalmethodsinferlanguagetreesonthebasisofthepercentageofsharedcognatesbetweenlanguagesthemoresimilarthelanguages,themorecloselytheyarerelated.Wordsarejudgedtobecognateifthey
canbeshowntoberelatedviaapaernofsystematicsound correspondences andhavesimilar meanings(see Fig. 8.1 for some examples). This informationcanbeusedtoconstructevolutionarylanguagetrees.
Gloochronologyisanextensionofthisapproachtoestimatedivergence times under theassumptionofaglooclock,orconstantrateoflanguagechange.Thefollowingformulaecanbeusedtorelatelanguagesimilaritytotimealonganexponentialdecaycurve:
t logC
2logr
wheretistimedepthinmillennia,Cisthepercentageofcognatessharedandristheuniversalconstantorrateofretention(theexpectedproportionofcognatesremaining aPer 1000 years of separation: Swadesh1955).Usually,analysesarerestrictedtotheSwadeshwordlist,acollectionof100200basicmeaningsthat
arethoughttoberelativelyculturallyuniversal,stableand resistant to borrowing. These include kinshipterms(e.g.mother,father),termsforbodyparts(e.g.hand,mouth,hair),numeralsandbasicverbs(e.g.todrink,tosleep,toburn).FortheSwadesh200wordlist,avalueof81percentisoPenusedforr.
Linguistshave identified a number of seriousproblemswiththegloochronologicalapproach:1. Muchoftheinformationinthelexicaldataislost
whenwordinformationisreducedtopercentagesimilarity scores between languages (Steel etal.1988).
2. Themethodsusedtoconstructevolutionarytreesfromlanguagedistancematriceshavebeenshown
toproduceinaccurateresults,particularlywhereratesofchangevary(Blust2000).3. Languagesdonotalwaysevolveataconstantrate.
Bergsland& Vogt (1962) compared presentdaylanguages with their archaic forms and foundevidence for significant rate variation betweenlanguages.Forexample,IcelandicandNorwegianwere compared to their common ancestor, OldNorse,spokenroughly1000yearsago.Norwegianhasretained81percentofthevocabularyofOldNorse,correctlysuggestinganageofapproximately1000years.However,Icelandichasretainedover95percentoftheOldNorsevocabulary,falselysuggestingthatIcelandicsplitfromOldNorseless
than200yearsago.4. Languages do not always evolve in a treelikemanner(Batemanetal.1990;Hjelmslev1958).Borrowingbetweenlanguagescanproduceerroneous(or,inextremecases,meaningless)languagetrees.Also,widespreadborrowingcanbiasdivergencetimeestimatesbymakinglanguagesseemmoresimilar(andhenceyounger)thantheyreallyare.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
3/20
93
HowOldistheIndoEuropeanLanguageFamily?
These problems have led many linguists tocompletelyabandonanyaempttoderivedatesfromlexicaldata.Forexample,Clackson(2000,451)claimsthatthedataandmethodsdonotallowthequestion
WhenwasrotoIndoEuropeanspoken?tobeansweredinanyreallymeaningfulorhelpfulway.
Fortunately,noneoftheseproblemsareuniquetolinguistics.It isironicthat whilst computationalmethods in historical linguistics have fallen out offavouroverthelasthalfcentury,computationalbiologyhasthrived.Inmuchthesamewayaslinguistsuseinformationaboutcurrentandhistoricallyaestedlanguagestoinfertheirhistory,evolutionarybiologistsuse DNA sequence, morphological and sometimes
behavioural data to construct evolutionary treesof biological species. Questions of relatedness anddivergencedatesareofinteresttobiologistsjustastheyaretolinguists.Asaresultbiologistsmustalso
dealwith the problems outlinedabove:nucleotidesequenceinformationis lostwhendataisanalyzedasdistancematrices(Steeletal.1988);distancebasedtreebuildingmethodsmaynotaccuratelyreconstructphylogeny (Kuhner & Felsenstein 1994); differentgenes(andnucleotides)evolveatdifferentratesandtheseratescanvarythroughtime(Excoffier&Yang1999);andfinally,evolutionisnotalwaystreelikeduetophenomenasuchashybridizationandhorizontalgenetransfer(Faguy&Doolile2000).
Despitetheseobstacles,computationalmethodshaverevolutionizedevolutionarybiology.Ratherthangiving up and declaringthat timedepth estimatesareintractable,biologistshavedevelopedtechniques
toovercomeeachproblem.Here,wedescribehowthesebiologicalmethodscanbeadaptedandappliedtolexicaldatatoanswerthequestionHowoldistheIndoEuropeanlanguagefamily?
4.Fromwordliststobinarymatrices
Inordertoestimatephylogeniesaccuratelyweneedtoovercometheproblemofinformationlossencounteredinlexicostatisticsandgloochronology.Thisrequiresalargedatasetwithindividualcharacterstateinformationforeachlanguage.Lexicaldataareidealbecausethereare alargenumber ofwellstudiedcharactersavailableandthesecanbe dividedintomeaningful
evolutionaryunitsknownascognatesets(asdescribedabove,wordsarejudgedtobecognateiftheycanbeshowntoberelatedviaapaernofsystematicsoundcorrespondencesandhavesimilarmeaning).Cognatewordsfromdifferentlanguagescanbegroupedintocognatesetsthatreflectpaernsofinheritance.Owingtothepossibilityofunintuitiveormisleadingsimilaritiesbetweenwordsfromdifferentlanguages,expert
knowledgeofthesoundchangesinvolvedisrequiredinordertomakecognacyjudgementsaccurately.Forexample, knowledge of regular sound correspondencesbetweenthelanguagesisrequiredtoascertain
thattheEnglishwordwheniscognatewithGreekpoteofthesamemeaning.Conversely,EnglishhaveisnotcognatewithLatinhaberedespitesimilarwordformandmeaning.
Toestimatetreetopologyandbranchlengthsaccuratelyrequiresalargeamountofdata.OurdatawastakenfromtheDyen etal.(1992)IndoEuropeanlexicaldatabase,whichcontainsexpertcognacyjudgementsfor200Swadeshlisttermsin95languages.Dyenetal.(1997)identifiedelevenlanguagesaslessreliableandhencetheywerenotincludedintheanalysispresentedhere.Threeextinctlanguages(Hiite,TocharianAandTocharianB)wereaddedtothedatabaseinanaempttoimprovetheresolutionofbasalrelationshipsinthe
inferredphylogeny.Multiplereferenceswereusedtocorroboratecognacyjudgements(Adams1999;Gamkrelidze&Ivanov1995;Guterbock&Hoffner1986;Hoffner1967;Tischler1973;1997).Foreachmeaninginthedatabase,languagesweregroupedintocognatesets.SomeexamplesareshowninFigure8.1.
ByrestrictinganalysestobasicvocabularysuchastheSwadeshwordlisttheinfluenceofborrowingcanbeminimized.Forexample,althoughEnglishisaGermaniclanguage,ithasborrowedaround60percentofitstotallexiconfromFrenchandLatin.However,onlyabout6percentofEnglishentriesintheSwadesh200wordlistareclearRomancelanguage
borrowings(Embleton1986).Knownborrowingswere
notcodedascognateintheDyenetal.database.Forexample,theEnglishwordmountainwasnotcodedascognatewithFrenchmontagne,sinceitwasobviously
borrowedfromFrenchintoEnglishaPertheNormaninvasion.Anyremainingreticulationcanbedetectedusingbiologicalmethodssuchassplitdecomposition,which can identify conflicting signal. The issue of
borrowinginlexicaldataisdiscussedinmoredetailbyHolden&Gray(Chapter2thisvolume;seealsoBryantetal.2005).
WecanrepresenttheinformationinFigure8.1mostsimplyasbinarycharactersinamatrix,wherethepresenceorabsenceofaparticularcognatesetinaparticularlanguageisdenotedbya1 or0respec
tively.Figure8.2showsabinaryrepresentationofthecognateinformationfromFigure8.1.Usingthiscodingprocedureweproducedamatrixof2449cognacy
judgementsacross87languages.Alternativecodingmethodsarealsopossible, suchasrepresentingthedata as 200meaning categories eachwith multiplecharacter states. It has been argued that semanticcategoriesarethefundamentalobjectsoflinguistic
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
4/20
94
Chapter8
change(seeEvans etal.Chapter9thisvolume)andthatbinarycodingofthepresenceorabsenceofcognatesetsisthusinappropriate.However,cognatesetsconstitutediscrete,relativelyunambiguousheritableevolutionaryunitswithabirthanddeath(seeNicholls&Gray,Chapter14thisvolume)andthereisnoreasontosupposetheyareanymoreorlessfundamentaltolanguage evolution than semantic categories. Further, coding thedata as semantic categories makes
itdifficulttodealwithpolymorphisms(i.e.whenalanguagehasmorethanonewordforagivenmeaninge.g.forthemeaningseaGermanhasbothSeeandMeer).Italsosignificantlyincreasesthenumberofparametersrequiredtomodeltheprocessofevolution.agel(2000)pointsoutthat,ifeachwordrequiresa differentset of rateparameters,then forjust 200wordsin40languagesthereare1278parameterstoestimate.Wethususedabinarycodingofcognatepresence/absenceinformationtorepresentlinguistic
changeinouranalysis.Aswellasavoidingtheproblemofinformationloss,analyzingcognatepresence/absenceinformationallowsustoexplicitlymodeltheevolutionaryprocess.UnlikelexicostatisticsandgloAochronology,wedonotcountthenumberofcognatessharedbetweenlanguages,nordowecalculatepairwisedistancesbetweenlanguages. Instead,thedistributionofcognatesismappedontoanevolutionarylanguagetree(seeFig.8.3),andlikelycharacterstatechangesareinferredacrossthewholetree.
5.Modelsareliesthatleadustothetruth
Whenbiologists model evolution, theylie: theylieabout the independence of character state changesacrosssites;theylieaboutthehomogeneityofsubstitutionmechanisms;andtheylieabouttheimportanceofselectionpressureonsubstitutionrates.Buttheseareliesthatleadustothetruth.Biologicalresearchis
basedonastrategyofmodelbuildingandstatisticalinference that has proved highly successful (Hillis
English here1 sea5 water9 when12
German hier1 See5,Meer6 Wasser9 wann12
French ici2 mer6 eau10 quand12
Italian qui2,qua2 mare6 acqua10 quando12
ModernGreek edo3 thalassa7 nero11 pote12
Hiite ka4 aruna8 watar9 kuwapi12
Figure8.1.SelectionoflanguagesandSwadeshlistterms.Cognacyisindicatedbythenumbersinsuperscript.
Meaning here sea water when
Cognateset 1 2 3 4 5 6 7 8 9 10 11 12
English 1 0 0 0 1 0 0 0 1 0 0 1
German 1 0 0 0 1 1 0 0 1 0 0 1
French 0 1 0 0 0 1 0 0 0 1 0 1
Italian 0 1 0 0 0 1 0 0 0 1 0 1
Greek 0 0 1 0 0 0 1 0 0 0 1 1
Hiite 0 0 0 1 0 0 0 1 1 0 0 1
Figure8.2.CognatesetsfromFigure8.1expressedinabinarymatrixshowingcognatepresence(1)orabsence(0).
Figure8.3.Characterstatesforcognatesets5(black)and6(grey)fromFigure8.1areshownmappedontoahypotheticaltree.Impliedcharacterstatechangescanthenbereconstructedonthetree.Theblackandgreybandsshowalikelypointatwhichcognatesets5and6weregained.Wecanusethisinformationtoevaluatepossibleevolutionaryscenarios.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
5/20
95
HowOldistheIndoEuropeanLanguageFamily?
1992;Hillisetal.1996;agel1999).Thegoalforbiologistsisnottoconstructamodelsocomplexthatitcaptureseverynuanceandvagaryoftheevolutionaryprocess,butrathertofindthesimplestmodelavailablethatcanreliablyestimatetheparametersofinterest.
Modelchoiceisthusabalancebetweenoverandunderfiingparameters(Burnham&Anderson1998). Adding extra parameters can improve theapparentfitofamodeltodata,however,samplingerroris also increasedbecause thereare moreunknownparameterstoestimate(Swoffordetal.1996).Dependingonthequestionwearetryingtoanswer,thisaddeduncertaintycanpreventusfromestimatingthemodelparametersfromthedatatowithinausefulmarginoferror.Inmanycases,addingjustafewextraparameterscancreateacomputationallyintractableproblem.Conversely,amodelthatistoosimplecanproducebiasedresultsifitfailstocaptureanimportantpartoftheprocess(Burnham&
Anderson1998).Thereisthusacompromisebetweenbiasedestimatesandvarianceinflation.Thestrategythathasprovedsuccessfulinbiol
ogyistostartwithasimplemodelthatcapturessomeof thefundamentalprocessesinvolvedand increasethecomplexityasnecessary.Forexample,nucleotidesubstitutionmodelsrangefromasimpleequalratesmodel(Jukes&Cantor1969),tomorecomplexmodelsthat allow for differences in transition/transversionrates,unequalcharacterstatefrequencies,sitespecificrates,andautocorrelationbetweensites(Swofford etal.1996).Althougheventhemostcomplicatedmodelsaresimplificationsoftheprocessofevolution,oPenthesimplestsubstitutionmodelcapturesenoughofwhat
isgoingontoallowbiologiststoextractameaningfulsignalfromthedata.Levins(1966)givesthreereasonswhyweshoulduseasimplemodel.First,violationsoftheassumptionsofthemodelareexpectedtocanceleachotherout.Second,smallerrorsinthemodelshouldresultinsmallerrorsintheconclusions.Andthird,bycomparingmultiplemodelswithrealitywecandeterminewhichaspectsoftheprocessareimportant.
Likelihoodevolutionarymodellinghasbecomethemethodofchoiceinphylogenetics(Swoffordetal. 1996). The likelihood approach to phylogeneticreconstructionallowsustoexplicitlymodeltheproc
essoflanguageevolution.Themethodisbasedonthepremisethatweshouldfavourthetreetopology/topologiesandbranchlengthsthatmakeourobserveddatamostlikely,giventhedataandassumptionsofourmodeli.e.weshouldfavourthetreewiththehighestlikelihoodscore.Wecanevaluatepossibletreetopologiesforagivenmodelanddatabymodellingthesequenceofcognategainsandlossesacrossthetrees.
Likelihoodmodelshaveanumberofadvantagesoverotherapproaches.First,wecanworkwithexplicitmodelsofevolutionandtestbetweencompetingmodels.Theassumptionsofthemethodarethusovertand easilyverifiable.Second,we canincrease
thecomplexityof themodelas required.Forexample,asexplainedbelow,wewereabletotestfortheinfluenceofratevariationbetweencognatesetsand,asaresult,incorporatethisintotheanalysisusingagamma distribution. And third, model parameterscanbeestimatedfromthedataitself,thusavoidingrestrictiveaprioriassumptionsabouttheevolutionaryprocessesinvolved(agel1997).
Likelihoodmodelsofevolutionareusuallyexpressedasaratematrixrepresentingtherelativeratesofallpossiblecharacterstatechanges.Here,weareinterestedintheprocessesofcognategainandloss,respectivelyrepresentedby0to1changesand1to0changesonthetree(seeFig.8.3).Wecanmodelthis
processeffectivelywitharelativelysimpletwostatetimereversiblemodeloflexicalevolution(showninFig.8.4).Weextendedthissimplemodelbyaddingagammashapeparameter()toallowratesofchangetovarybetweencognatesetsaccordingtoagammadistribution.ThiswasimplementedaPeralikelihoodratio test (Goldman1993) showed that adding thegammashapeparametersignificantlyimprovedtheabilityofthemodeltoexplainthedata(2=108,df=1,p
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
6/20
96
Chapter8
theprocessof linguistic change,weexplicitlyrejectWarnow etal.s (Chapter7 thisvolume) counselofdespair,thatlanguageevolution is so idiosyncraticand unconstrained that inferring divergence datesisimpossible.Languageevolutionissubjecttoreal
worldconstraints,suchashumanlanguageacquisition, expressiveness, intelligibility, and generationtime.WecannothelpbutquoteRingeetal.(2002,61)onthispoint:
Languagesreplicatethemselves(andthussurvivefromgeneration togeneration) through a processofnativelanguageacquisitionbychildren.Importantlyforhistoricallinguistics,thatprocessistightlyconstrained.
Theseconstraintscreateunderlyingcommonalitiesintheevolutionaryprocessthatwecan,andshould,betryingtomodel.
Evans et al. (Chapter 10 this volume) arguethatourmodelispatentlyinappropriatebecause
it assumes that all charactersare independent. Inbiology,thisisknownastheI.I.D.(identicallyandindependently distributed) assumption. Evans etal.correctlypointoutthattheI.I.D.assumptionisviolatedwhenindividualmeaningsintheSwadeshwordlistarebrokenupintocharactersrepresentingmultiple cognate sets. Specifically, if a particularcognatesetispresentinalanguage,itwillbe less
likelythatothercognatesetsforthesamemeaningwillalsobepresent.However,wedonotthinkthatthislackofindependencebiasesourresults.Theissueofindependencewillbedealtwithindetailin
section10.2.
6.Bayesianinferenceofphylogeny
Itisnotusuallycomputationallyfeasibletoevaluatethelikelihoodofallpossiblelanguagetreesfor87languagesthereareover110155possiblerootedtrees.Further,thevastnumberofpossibilitiescombinedwithfinitedatameansthatinferringasingletreewillbemisleadingtherewillalwaysbeuncertaintyinthetopology and branchlengths. Ifwe are touse ourresults to testhypotheseswe needto useheuristicmethodstosearchthroughtreespaceandquantifythisphylogeneticuncertainty. Bayesianinference is
analternativeapproachtophylogeneticanalysisthatallowsustodrawinferencesfromalargeamountofdata using powerful probabilistic models withoutsearchingfor the optimaltree (Huelsenbeck etal.2001).Inthisapproachtreesaresampledaccordingtotheirposteriorprobabilities.Theposteriorprobabilityofatree(theprobabilityofthetreegiventhepriors,dataandthemodel)isrelatedbyBayesstheoremtoitslikelihoodscore(theprobabilityofthedatagiventhetree)anditspriorprobability(areflectionofanyprior knowledge about tree topology that is to beincludedintheanalysis).Unfortunately,wecannotevaluatethisfunctionanalytically.However,wecanuseaMarkovChainMonteCarlo(MCMC:Metropolis
etal.1953)algorithmtogenerateasampleoftreesinwhichthefrequencydistributionofthesampleisanapproximationoftheposteriorprobabilitydistributionofthetrees(Huelsenbecketal.2001).Todothis,weusedMrBayes,aBayesianphylogeneticinferenceprogramme(Huelsenbeck&Ronquist2001).
MrBayes uses MCMC algorithms to searchthroughtherealmofpossibletrees.Fromarandomstartingtree,changesareproposedtothetreetopology,branchlengthsandmodelparametersaccordingtoaspecifiedpriordistributionoftheparameters.Thechangesareeitheracceptedorrejectedbasedonthe likelihood ofthe resulting evolutionaryreconstructioni.e.reconstructionsthatgivehigher
likelihoodscores tend tobe favoured.Inthiswaythechainquicklygoesfromsamplingrandomtreestosamplingthosetreeswhichbestexplainthedata.APeraninitialburninperiod,treesbegintobesampledinproportiontotheirlikelihoodgiventhedata.Thisproducesadistributionoftrees.Ausefulwaytosummarizethisdistributioniswithaconsensustreeorconsensusnetwork(Holland&Moulton2003)
Figure8.5.Thegammadistribution,usedtomodelratevariationbetweensites.Threepossiblevaluesforareshown.Forsmallvaluesof(e.g.=0.5),mostcognatesetsevolveslowly,butafewcanevolveathigherrates.sincreases,thedistributionbecomesmorepeakedandsymmetricalaroundarateof1i.e.ratesbecomemoreequal(e.g.=50).
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
7/20
97
HowOldistheIndoEuropeanLanguageFamily?
depictinguncertaintyinthereconstructedrelationships.Thesegraphsare,however,justusefulpictorialsummariesoftheanalysis.Thefundamentaloutputoftheanalysisisthedistributionoftrees.
TheconsensusnetworkfromaBayesiansample
distributionof100treesisshowninFigure8.6.Thevaluesnexttosplitsgiveanindicationoftheuncertaintyassociatedwitheachsplit(theposteriorprobability,derivedfromthepercentageoftreesintheBayesiandistributionthatcontainthesplit).Forexample,thevalue41nexttotheparallellinesseparatingItalicandCeltic from the of the IndoEuropean subfamiliesindicatesthatthatsplitwaspresentin41percentof
thetreesinthesampledistribution.Similarly,thesplitgroupingItalicandGermaniclanguageswaspresentin46percentofthesampledistribution.
7.Ratevariationandestimatingdates
Thereareatleasttwotypesofratevariationinlexicalevolution.First,ratevariationcanoccurbetweencognates.Forexample,evenintheSwadeshwordlist,theIndoEuropeanwordforfiveishighlyconserved(1 cognate set) whilst the word for dirty is highlyvariable(27cognatesets).Thisisakintositespecificrate variationin biology. Biologists canaccount for
Figure8.6.ConsensusnetworkfromtheBayesianMCMCsampleoftrees.Valuesexpresstheposteriorprobabilityofeachsplit(valuesabove90percentarenotindicated).thresholdof10percentwasusedtodrawthissplitsgraphi.e.onlythosesplitsoccuringinatleast10percentoftheobservedtreesareshowninthegraph.Branchlengthsrepresentthemediannumberofreconstructedsubstitutionspersiteacrossthesampledistribution.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
8/20
98
Chapter8
thistypeofratevariationbyallowingadistributionof rates.As mentionedabove, weused a model ofcognateevolutionthatallowedforgammadistributedratevariationbetweencognates.
Second, rates of lexical evolution can varythroughtimeandbetweenlineages.Clearlythiswillcauseproblemsifwearetryingtoestimateabsolutedivergencetimesontheinferredphylogeniessinceinferredbranchlengthsarenotdirectlyproportionaltotime.Again,biologistshavedevelopedanumberofmethodsfordealingwiththistypeofratevariation.Weusedthepenalizedlikelihoodmethodofratesmoothingimplementedinr8s(version1.7;Sanderson2002a),to allow forrate variation across each tree.Sanderson(2002b)hasshownthat,underconditionsofratevariation,thepenalizedlikelihoodratesmoothingalgorithmperformssignificantlybeerthanmethodsthatassumeaconstantrateofevolution.
Inordertoinferabsolutedivergencetimes,wefirstneedtocalibrateratesofevolutionbyconstrainingtheageofknownpointsoneachtreeinaccordancewithhistoricallyaesteddates.Forexample,theRomancelanguages(derivedfromLatin)probablybegantodivergepriortothefalloftheRomanEmpire.WecanthusconstraintheageofthenodecorrespondingtothemostrecentcommonancestoroftheRomance
languagestowithintherangeimpliedbyourhistoricalknowledge(seeFig.8.7).Weconstrainedtheageof14suchnodesonthetreeinaccordancewithhistoricalevidence(seeAtkinson&Gray2006).Theseknown
nodeageswerethencombinedwithbranchlengthinformationtoestimateratesofevolutionacrosseachtree.Thepenalizedlikelihoodmodelallowsratestovary across the treewhilstincorporating a roughnesspenaltythatcoststhemodelmoreifratesvaryexcessively frombranchto branch.This procedureallowsustoderiveageestimatesforeachnodeonthetree.Figure8.8showstheconsensustreefortheinitialBayesiansampledistributionof1000trees, 1with
branchlengthsdrawnproportionaltotime.Theposteriorprobabilityvaluesaboveeachinternalbranchgiveanindicationoftheuncertaintyassociatedwitheachcladeontheconsensustree(thepercentageoftreesintheBayesiandistributionthatcontaintheclade).For
example,thevalue67abovethebranchleadingtotheItaloCeltoGermaniccladeindicatesthatthatcladewaspresentin67percentofthetreesinthesampledistribution.We canderive ageestimates fromthistree,includinganageof 8700at the baseof thetreewithintherangepredictedbytheAnatolianfarmingtheoryofIndoEuropeanorigin.
Asingledivergencetime,withnoestimateoftheerrorassociatedwiththecalculation,isoflimitedvalue.To testbetween historical hypotheses weneed somemeasureoftheerrorassociatedwiththedateestimates.Specifically,uncertaintyinthephylogenygivesrisetoacorrespondinguncertaintyinageestimates.Inordertoaccountforphylogeneticuncertaintyweestimatedthe
ageatthebaseofthetreesinthepostburninBayesianMCMC sampletoproducea probabilitydistributionfortheageof IndoEuropean. Oneadvantageof theBayesian framework is that prior knowledge aboutlanguage relationships can be incorporated into theanalysis. Inordertoeliminatetreesthat conflictwithknownIndoEuropeanlanguagegroupings,theoriginal1000treesamplewasfilteredusingaconstrainttreerepresentingtheseknownlanguagegroupings[(Anatolian,Tocharian,(Greek,Armenian,Albanian,(Iranian,Indic),(Slavic,Baltic),((NorthGermanic,WestGermanic),Italic,Celtic)))].ThisconstrainttreewasconsistentwiththemajorityruleconsensustreegeneratedfromtheentireBayesiansampledistribution.Thefiltereddistribution
ofdivergencetimeestimateswasthenusedtocreateaconfidenceintervalfortheageoftheIndoEuropeanlanguagefamily.ThisdistributioncouldthenbecomparedwiththeagerangesimpliedbythetwomaintheoriesofIndoEuropeanorigin(seeFig.8.9).TheresultsareclearlyconsistentwiththeAnatolianhypothesis.
Notallhistoricallyaestedlanguagesplitswereusedin ouranalysis. One means of validating our
Figure8.7.TheRomancelanguages(derivedfromLatin)probablybegantodivergepriortothefalloftheRomanEmpire.Wecanthusconstraintheageofthepointonthetreewhichcorrespondstothisdivergenceevent(2).Usingthisrationale14nodeswereconstrained,includinganIberianFrenchnode(1)andaGermanicnode(3).
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
9/20
99
HowOldistheIndoEuropeanLanguageFamily?
Figure8.8.MajorityruleconsensustreefromtheinitialBayesianMCMCsampleof1000trees(Gray&tkinson
2003).Valuesaboveeachbranchindicateuncertainty(posteriorprobability)inthetreeasapercentage.Branchlengthsareproportionaltotime.Shadedbarsrepresenttheagerangeproposedbythetwomaintheoriesthenatoliantheory(greybar)andtheKurgantheory(hatchedbar).Thebasalage(8700`)supportsthenatoliantheory.
methodology is to produce divergence timedistributions fornodes thatwere notconstrainedin theanalysisandcomparethistothehistoricallyaested
timeofdivergence.Forexample,Figure8.10showstheinferreddivergencetimedistributionsfortheNorthand West Germanic subgroups. The grey band in
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
10/20
100
Chapter8
thesefiguresindicatesthelikelyageofeachsubgroupbasedonthehistoricalrecord.TheageestimatesfortheNorthGermaniccladecorrespondwithwrienevidenceforthebreakupoftheselanguagesbetween|}900and|}1250.Similarly,estimatedagesoftheWest Germanic clade are consistentwith historical
evidence dating the AngloSaxon migration to theBritishIslesabout1500yearsago.
8.Testingrobustness
AkeypartofanyBayesianphylogeneticanalysisisanassessmentoftherobustnessoftheinferences.Todothiswetestedtheeffectofalteringanumberofdifferentparametersandassumptionsofthemethod.
8.1.BayesianpriorsInitializingeachBayesianMCMCchainrequiredthespecification of a starting treeand prior parameters(priors)fortheanalysis.ThesampleBayesiandistribu
tionwastheproductoftenseparaterunsfromdifferentrandomstartingtrees.Divergencetimeandtopologyresultsforeachoftheseparaterunswereconsistent.Othertestanalyseswererunusingarangeofpriorsforparameterscontrollingtheratematrix,branchlengths,gammadistribution and characterstatefrequencies.Theinferredtreephylogenyandbranchlengthsdidnotnoticeablychangewhenpriorswerealtered.
8.2.CognacyjudgementsTheDyenetal.(1992)databasecontainedinformationaboutthecertaintyofcognacyjudgements.Wordswerecodedascognateordoubtfulcognates.Intheinitialanalysisweincludedallcognateinformationinanefforttomaximizeanyphylogeneticsignal.However,wewantedtotesttherobustnessofourresultstochangesinthestringencyofcognacydecisions.Forthisreason,theanalysiswasrepeatedwithdoubtfulcognatesexcluded.Thisproducedasimilaragerangetotheinitialanalysis,indicatingthatourresultswererobusttoerrorsincognacyjudgements(seeFig.8.11).
Figure8.9.FrequencydistributionofbasalageestimatesfromfilteredBayesianMCMCsampleoftreesfortheinitialassumptionset(n=435).Themajorityruleconsensustreefortheentire(unfiltered)sampleisshownintheupperle.
Figure8.10.FrequencydistributionofageestimatesfortheNorthandWestGermanicsubgroupsacrossfilteredBayesianMCMCsampleoftrees(n=433).ThegreybandsindicatethehistoricallyaAestedtimeofdivergence.
Figure8.11.FrequencydistributionofbasalageestimatesfromfilteredBayesianMCMCsampleoftreesforanalysiswithdoubtfulcognatesexcluded(n=433).Themajorityruleconsensustreefortheentiresampleisshownintheupperle.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
11/20
101
HowOldistheIndoEuropeanLanguageFamily?
8.3.CalibrationsandconstrainttreesAttheconferenceonwhichthisvolumeisbased,aquestionwasraisedabouttheageconstraintusedforInsularCeltic.Itwassuggestedthatwhilst weusedamaximumageconstraintof2750,anageconstraintof between2200 and 1800 wouldhavebeenmoresuitable.WedonotwishtoengageindebatesaboutthecorrectageoftheInsularCelticdivergence,however,areanalysisofthedatausingthesuggestedagesservestodemonstratetherobustnessofourresultstovariationsinageconstraints.Figure 8.12 shows the distribution of divergencetimes using themuch later Celticage constraints.Clearly,ourresultsarerobusttoalterationsinthisageconstraint.Infact,thestepbystepremovalofeachofthe14ageconstraintsontheconsensustreerevealedthatdivergencetimeestimateswererobusttocalibrationerrors acrossthe tree. For13 nodes,thereconstructedagewaswithin390yearsoftheoriginalconstraintrange.OnlythereconstructedageforHiiteshowedanappreciablevariationfromtheconstraintrange.Thismaybeaributabletotheeffectofmissingdataassociatedwithextinctlanguages.Reconstructed ages atthe base ofthe tree rangedfrom10,400 with theremovalof theHiiteage
constraint,to8500withtheremovaloftheIraniangroupageconstraint.Theresultsarehighlyrobustcalibration errors because of the large number ofageconstraintsweusedtocalibrateratesoflexicalevolutionacrossthetree.
WealsowantedtobesurethattheconstrainttreeusedtofiltertheBayesiandistributionoftreeshadnotsystematicallybiasedourresults.Figure8.13showsthedivergencetimedistributionfortheinitialdatasetaPerfilteringusingaminimumsetoftopologicalconstraints[(Anatolian,Tocharian,(Greek,Armenian,Albanian, (Iranian, Indic), (Slavic, Baltic), (NorthGermanic,WestGermanic),Italic,Celtic))].Again,thedivergencetimedistributionwasconsistentwiththeAnatolianfarmingtheory.
8.4.MissingdataAnotherpossiblebiaswastheeffectofmissingdata.SomeofthelanguagesintheDyen etal.(1992)data
base may have contained fewer cognates becauseinformationabouttheselanguageswasmissing.Forexample,thethreeextinctlanguages(Hiite,TocharianA&TocharianB)arederivedfromalimitedrangeofsourcetextsanditispossiblethatsomecognatesweremissedbecausethetermswerenotreferredtoin
Figure8.12.Frequencydistributionofbasalageestimatesfromfiltered
BayesianMCMCsampleoftreesusingrevisedCelticageconstraintofbetween1800`and2200`.Themajorityruleconsensustreefortheentiresampleisshownintheupperle.
Figure8.13.Frequencydistributionofbasalageestimatesfromfiltered
BayesianMCMCsampleoftreesusingminimumsetoftopologicalconstraints[(natolian,Tocharian,(Greek,rmenian,lbanian,(Iranian,Indic),(Slavic,Baltic),(NorthGermanic,WestGermanic),Italic,Celtic))](n=670).Themajorityruleconsensustreefortheentiresampleisshownintheupperle.
Figure8.14.Frequencydistributionofbasalageestimatesfromfiltered
BayesianMCMCsampleoftreeswithinformationaboutmissingcognatesincluded(n=620).Themajorityruleconsensustreefortheentiresampleisshownintheupperle.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
12/20
102
Chapter8
thesourcetext.Thismayhavebiaseddivergencetimeestimatesbyfalselyincreasingbasalbranchlengths.Nicholls&Gray(Chapter14thisvolume)pointoutthatweshouldexpectfewercognatestobepresent
inthelanguagesatthebaseofthetreeanywaythefactthatHiitehas94cognateswhilstmostlanguageshavearound200,doesnotnecessarilyimplythatdatais missing. Nonetheless, wetested forthe effect ofmissingdatabyincludinginformationaboutwhetherornotthewordforaparticulartermwasmissingfromthedatabase.Ifwecouldnotruleoutthepossibilitythatacognatewasabsentfromalanguagebecauseithadnotbeenfoundorrecorded,thenthatcognatewascodedasmissing(representedbya?inthematrix).Encodingmissingcognateinformationinthiswaymeansthatwecanaccountforuncertaintyinthedataitselfusingthelikelihoodmodeltheunknownstatesbecomeparameterstobeestimated.Analyzing
thisrecodeddataalsoproducedanagerangeconsistentwiththeAnatoliantheory(seeFig.8.14).
8.5.RootofIndoEuropeanFinally,wetestedtheeffectoftherootingpointforthetrees.Inthepreviousanalyses,treeswererootedwithHiite.Although this is consistentwith independentlinguisticanalyses(Gamkrelidze&Ivanov1995;Rexov et al. 2003), otherpotential root points arepossible.ItcouldbeclaimedthataHiiterootbiasesageestimatesinfavouroftheAnatolianhypothesis.WethusrerantheratesmoothinganalysisrootingtheconsensustreeinFigure8.8withBaltoSlavic,Greek,TocharianandIndoIraniangroups.Inallfourcases
theestimated divergence timeincreasedtobetween9500and10,700.
9.iscussion
Thetimedepthestimatesreportedhereareconsistentwiththetimespredictedbyaspreadoflanguagewith the expansion of agriculture from Anatolia.The branching paern and dates of internal nodesarebroadlyconsistentwitharchaeologicalevidenceindicatingthatbetweenthetenthandsixthmillenniaaculturebasedoncerealcultivationandanimalhusbandryspreadfromAnatoliaintoGreeceandtheBalkansandthenoutacrossEuropeandtheNearEast
(Gkiastaetal.2003;Renfrew1987).Hiiteappearstohave divergedfromthe mainrotoIndoEuropeanstockaround8700yearsago,perhapsreflectingtheinitial migration out of Anatolia. Indeed, this dateexactlymatchesestimatesfortheageofEuropesfirstagriculturalselementsinsouthernGreece(Renfrew1987). Followingthe initialsplit, the language treeshowsthe formationof separate Tocharian, Greek,
andthenArmenianlineages,allbefore6000,withall of the remaining language families formed by4000.Wenotethatthereceivedlinguisticorthodoxy (IndoEuropean is only 6000 years old) does
approximatelyfitthedivergencedatesweobtainedformostofthebranchesofthetree.Onlythebasal
branches leading to Hiite, Tocharian, Greek andArmenianarewellbeyondthisage.Interestingly,thedaterangehypothesizedfortheKurganexpansiondoescorrespondtoarapidperiodofdivergenceontheconsensustree.AccordingtothedivergencetimeestimatesshowninFigure8.8,manyofthemajorIndoEuropeansubfamiliesIndoIranian,BaltoSlavic,Germanic,ItalicandCelticdivergedbetweensixandseventhousandyearsagointriguinglyclosetothehypothesizedtimeoftheKurganexpansion.ThusitseemspossiblethatthereweretwodistinctphasesinthespreadofIndoEuropean:aninitialphase,involv
ingthemovementofIndoEuropeanwithagriculture,outofAnatoliaintoGreeceandtheBalkanssome8500yearsago;andasecondphase(perhapstheKurganexpansion)whichsawthesubsequentspreadofIndoEuropeanlanguagesacrosstherestofEuropeandeast,intoersiaandCentralAsia.
10.Responsetoourcritics
10.1.ThepotentialpitfallsoflinguisticpalaeontologyA number of linguists have claimed that linguisticpalaeontology offers a compelling reason why theargumentswehavepresentedmustbewrong:rotoIndoEuropeansareclaimedtohavehadawordfor
wheel(*kW(e)kWlo)butwheelsdidnotexistinEurope9000yearsago.Thecaseisbasedonawidespreaddistributionof apparently related words forwheelinIndoEuropeanlanguages.ThisisoPenpresentedasaknockdownargumentagainstanyageofIndoEuropeanolderthan5000to6000years(whenwheelsfirstappearinthearchaeologicalrecord).However,there are at least two alternative explanations forthedistributionoftermsassociatedwithwheelandwheeledtransport.
First,independent semanticinnovationsfromacommonrootarealikelymechanismbywhichwecanaccountforthesupposedrotoIndoEuropeanreconstructionsassociated withwheeled transport
(Trask1996;Watkins1969).Linguistscanreconstructwordformswithmuchgreatercertaintythantheirmeanings. Forexample,uponthe developmentofwheeledtransport,wordsderivedfromtherotoIndoEuropeanterm*kwel(meaningtoturn,rotate)mayhavebeenindependentlycooptedtodescribethewheel.On thebasisof thereconstructedagesshowninFigure8.8,asfewasthreesuchsemantic
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
13/20
103
HowOldistheIndoEuropeanLanguageFamily?
innovationsaroundthesixthmillenniumcouldhaveaccountedfortheaesteddistributionoftermsrelatedto*kW(e)kWlowheel(oneshiPjustbeforethe break up of the ItalicCelticGermanicBalto
SlavicIndoIranianlineage,oneshiPintheGreekArmenianlineage,andoneshiP[orborrowing]intheTocharianlineage).
The second possible explanation for the distributionof termspertainingto wheeledvehiclesiswidespread borrowing. Good ideas spread. TermsassociatedwithanewtechnologyareoPenborrowedalong with the technology. The spread of wheeledtransportacrossEuropeandtheNearEast50006000yearsagoseemsalikelycandidateforborrowingofthissort.Linguistsareabletoidentifymanyborrowings(particularlymorerecentones)onthebasisofthepresenceorabsenceofcertainsystematicsoundcorrespondences.However,ourdateestimatessuggest
thatmostofthemajorIndoEuropeangroupswerejustbeginningtodivergewhenthewheelwasintroduced.WewouldthusexpectthecurrentlyaestedformsofanyborrowedtermstolookasiftheywereinheritedfromrotoIndoEuropeantheymaythusbeimpossibletoreliablyidentify.
Bothoftheseargumentsarediscussedinmoredetail elsewhere (Watkins 1969; Renfrew 1987; Atkinson&Gray2006).Itsufficestosaythatboththepowerandthepitfallsoflinguisticpalaeontologyarewellknown.Wearedisappointedthatintheirrushtodismissourpaperinthemedia,otherwisescholarlyandresponsiblelinguistshaveclaimedmuchgreatercertainty for their semantic reconstructions thanis
justifiable.Thisdoesnotmeanthatwethinkthereisnoissuehere.Ideally,weshouldaimtosynthesizealllinesofevidencerelatingtotheageofIndoEuropean.Ringe (unpublished manuscript) presents a carefulsummary of the terms related to wheeled vehiclesinIndoEuropean.Hearguesthatwordsforthill(apolethatconnectsayokeorharnesstoavehicle)andyokecanconfidentlybereconstructedforrotoIndoEuropean.Henotesthatreflexesof *kW(e)kWlowheelhavenotbeenfoundinAnatolianlanguagesbutexistin TocharianA andB andother IndoEuropeanlanguages, and hence can be reconstructed for thecommonancestorofallnonAnatolianIndoEuropeanlanguages. Ringe claims that the specific forms of
thesewordsmakeparallelsemanticchangesorborrowingextremelyimplausible.Itwouldbeextremelyusefultoaempttoquantifyjusthowunlikelysuchalternativescenariosare.Untilalltheassumptionsoftheseargumentsareformalized,andtheprobabilityofalternativescenariosquantified,itwillremaindifficulttosynthesizeallthedifferentlinesofevidenceontheageofIndoEuropean.
10.2.IndependenceofcharactersAsmentionedinsection5,Evansetal.(Chapter10thisvolume)claimourevolutionarymodelofbinarycharacterevolutionispatentlyinappropriatebecause
it assumes independence betweencharacters whenourcharactersareclearlynotindependent.However,wedonotbelievethatanyviolationofindependencenecessarilybiasesourtimedepthestimates.Wenotethattheassumptionofindependencedoesnotholdfornucleotideoraminoacidsequencedataeither.Forexample,compensatingsubstitutionsinribosomalRNAsequencesresultincorrelationbetweenpairedsitesinstemregions(Felsenstein2004).However,biologistsstillgetreasonablyaccurateestimatesofphylogenydespiteviolationsofthisassumption.Infact,nothingintheEvansetal.paperdemonstratesthatcodingthedataasbinarycharacters,ratherthanthemultistatecharacters,willproducebiasedresults.agel&Mead(Chapter15this
volume)demonstratedthat,onthecontrary,binaryandmultistatecodeddataproducetreesthatdifferinlengthbyaconstantofproportionality.Inotherwords,thebinaryandmultistatetreesarejustscaledversionsofoneanother.Sinceweestimateratesofevolutionforeachtreeusingthebranchlengthsofthattree,scalingthebranchlengthsdoesnotaffectourresults.agel&Meade(Chapter15thisvolume)alsoapproximatedtheeffectofviolationsoftheindependenceassumptionontheMCMCanalysisbyheatingthelikelihoodscores.Theyinferredthatviolationsofindependencewouldproducehigherposteriorprobabilityvaluesbutwouldhavelileeffectontheconsensustreetopology.Thismeansthatwemayhaveunderestimatedtheer
rorduetophylogeneticuncertaintybutourestimateswillnotbebiasedtowardsanyparticulardate.Finally,treatingcognatesetsasthefundamen
talunitof lexicalevolutiondoesnot,asEvansetal.(Chapter10thisvolume)argue,constituteanextremeviolation of theindependenceassumption.AlmostallofthelanguagesintheDyen etal.(1992)databasecontain polymorphisms, meaning that for a givenlanguagethereexistmultiplewordsofthesamemeaning.Thepolymorphismsinourdataareareflectionof thenature of lexicalevolution.Specifically, theydemonstrate a lack of strict dependence betweencognatesetswithinmeaningcategoriesi.e.awordwitha given meaning canarise in a languagethat
alreadyhasawordofthatmeaning.Modelsoflexicalevolutionthatdonotallowpolymorphisms(e.g.Ringeetal.2002)couldalsobelabelledas patentlyinappropriatebecausetheyassumethatforawordtoariseinalanguageanyexistingwordswiththatmeaning mustbe concomitantlylost fromthe language.Thisisnotalwaysthecase.Ringeetal.(2002)notethatalthoughthewordssmallandlilehave
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
14/20
104
Chapter8
verysimilarmeanings,theyhavepersistedtogetherinEnglishforoverathousandyears.Ourbinarycodingprocedureallowsustorepresentsuchpolymorphisms withease. Thepresenceof polymorphismsmeansthat dependencies betweencognatesets arenotasstrongasEvans etal. claim.Afurtherfactorthatweakensthedependenciesbetweenthecognatesetsarisesfromthethinningprocessthatoccursinlexicalevolution.Theobservedcognatesetsdonot
representthefullcomplimentofactualcognatesthataroseinIndoEuropean(seeNicholls&GrayChapter14thisvolume).Somecognatesthatexistedinthepastwill nothave persistedinto presentday languagesandanyuniquecognateswerenotincludedintheanalysis.Thisthinning ofthecognatesalsoacts toreducedependenciesbetweencharactersintheanalysisandthusfurtherweakensanyeffectofviolationsofindependence.FurtherresearchbyAtkinsonetal.(2005)usingsyntheticdatahasshownthatviolationsoftheindependenceassumptiondonotsignificantlyaffectdateestimates.
10.3.Confidenceinlexicaldata
Fromaphylogeneticviewpointthelexiconisatremendouslyaractivesourceofdatabecauseofthelargenumberofpossiblecharactersitaffords.However,weareawarethatmanyhistoricallinguistsarescepticalofinferencesbasedpurelyonlexicaldata.Garre(Chapter12thisvolume)arguesthatborrowingoflexicalterms,oradvergence,withinthemajorIndoEuropean
subgroupscouldhavedistortedourresults.Heidentifiesanumberofcaseswhereanancestraltermhas
beenreplacedbyadifferentterminallorsomeofthedaughterlanguages,presumablyviaborrowing:
ThusLatinignisfirehasbeenreplacedbyreflexesofLatinfocushearththroughoutRomance,andarchaicSanskrithanti kills hasbeen replacedby reflexesof a younger Sanskrit form marayati throughoutIndoAryan.
Garret argues correctly that, where a word hasbeen borrowed across a subgroup aPer the initialdivergenceofthe group,ourmethodwillinferthattheword evolvedin thebranchleadingup tothatsubgroup(seeLatinfocusexample:Fig.8.15a).Thiswill falsely inflate the branch lengths below thesubgroups and deflate branch lengths within eachgroup. Since we estimaterates of evolution on the
basis of withingroup branch lengths, it is arguedthatwewillunderestimateratesofchangeandhenceoverestimatedivergencelowerinthetree.However,thisargumentrequiresthattwospecialassumptionshold.First,anyborrowingmustoccuracrossawholesubgroupandonlyacrossawholesubgroup.When
termsarenotborrowedacrossthewholegroupthereisnosystematicbiastoinferchangesinthebranchleadingtothegroup.Dependingonthedistributionofborrowedterms,advergencecanevenproducetheoppositeeffect,falselyinflatingbranchlengthswithinsubgroups and hence causing us to underestimatedivergencetimes.Itseemsunlikelythatalloreven
Figure8.15.a)ParsimonycharactertraceforreflexesofLatinfocus(originallyhearthbutborrowedasfire)ontheRomanceconsensustree.Blackindicatespresenceofthecharacter,greyindicatesabsenceanddashedindicates
uncertainty.ThisshowsborrowingacrossthewholeRomancesubgroupevolutionarychangeisinferredatthebaseofthesubgroupwithnochangewithinthesubgroup,falselyinflatingdivergencetimeestimates.b)aswith(a)butforreflexesofLatintesta(originallycup,jar,shellbutborrowedashead).HeretheborrowingisnotacrossthewholeRomancesubgroupevolutionarychangeisinferredwithinthesubgroup.
a b
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
15/20
105
HowOldistheIndoEuropeanLanguageFamily?
mostborrowedtermswereborrowedacrossanentiresubgroup.Garrehighlighted16instancesofborrowing within IndoEuropeansubgroups.2 These werepresumably selected because theywere thought to
reflectthesortofadvergencepaernthatwouldbiasourresults.Ofthese,atleast6areunlikelytofavourinferredlanguagechangeatthebaseofasubgroup.3Figure8.15bshowstheexampleoftheRomancetermforhead.
Second,evenifweacceptthefirstassumption,wemustassume that the proposedprocess ofadvergenceis unique to contemporarylanguages.AsGarre(Chapter12thisvolume)putsit,thisrequirestheunscientificassumptionthatlinguisticchangeintheperiodforwhichwehavenodirectevidencewasradicallydifferentfromchangewecanstudydirectly.Ratherthanarguingthatborrowingwasrareatonestageandthensuddenlybecamecommonacrossall
ofthemajorlineagesataboutthesametime,itseemsmoreplausibletosuggestthatborrowinghasalwaysoccurred.IfthesameprocessofadvergenceinrelatedlanguageshasalwaysoccurredthentheeffectofshiPingimpliedchangestomoreancestralbrancheswill
bepropagateddownthetreesuchthatthereshouldbenonet effecton divergencetimecalculation.Forexample,borrowingwithinItalicmayshiPinferredchangesfromthemoremodernbranchestothebranchleadingtoItalic,butborrowingbetweenrotoItalicanditscontemporarieswillalsoshiPinferredchangesfromthisbranchtoancestralbranches.ThismeansthatwhilstwemayincorrectlyreconstructsomeprotoIndoEuropean roots, our divergence time calcula
tionwillnotbeaffected.Wemaintainthatalthoughadvergence has undoubtedly occurred throughoutthehistoryofIndoEuropean,andthatthismayhaveaffectedourtrees,thiseffectislikelytoberandomandthereisnoreasontothinkitwillhavesignificantly
biased our results. Atkinson et al. (2005) analyzedsyntheticdatawithsimulatedborrowing,andfoundthatdateestimateswerehighlyrobusttoevenhighlevelsofborrowing.
Ringeetal.(2002)arguethatnonlexicalcharacterssuchasgrammaticalandphonologicalfeaturesarelesslikelytobeborrowed(althoughtheyalsonotethatparallelchangesinphonologicalandmorphologicalcharactersarepossible).Toavoidpotentialproblems
duetolexicalborrowingtheycoded15phonologicaland22morphologicalcharactersasstrictconstraintsintheiranalyses(theydidnotthrowouttheremaining333lexicalcharacters).Whileweagreethatphonologicalandmorphologicalcharacterswouldbeveryuseful,webelievetherearegoodreasonstotrusttheinferencesbasedonthelexicaldatainourcase.TheDyenetal.(1992)datahashadmuchoftheknown
borrowingfilteredfromit.Further,therelationshipsweinfer betweenIndoEuropean languagesare remarkablysimilartothoseinferredbylinguistsusingthe comparative method. Our results are not onlyconsistentwithacceptedlanguagerelationships,butalsoreflectacknowledgeduncertainties,suchasthepositionof Albanian. Ourtimedepth estimatesforinternalnodesoftheIndoEuropeantreearealsocongruentwithknownhistoricalevents(i.e.whenconstraintswereremovedstepbystepfromeachofthe13internalconstraintpoints,thereconstructedages
werewithin390yearsoftheoriginalconstraintrange:Gray&Atkinson2003).Significantly,ifweconstrainourtreestofittheRingeetal.(2002)typologywegetverysimilardateestimatestoourinitialconsensustreetopology.Inshort,thereisnothingtoindicatethateitherourtreetypologiesordateestimateshavebeenseriouslydistortedbytheuseofjustlexicaldata.
DeterminedcriticsmightstillclaimthattheremainingundetectedlexicalborrowingthatundoubtedlyexistsintheDyenetal.data(seeNicholls&GrayChapter14thisvolume)hasledustomakeerroneoustimedepthinferencesattherootoftheIndoEuropeantree. The Swadesh 100word list is expected to bemoreresistanttochangeandlesspronetoborrowing
thanthe200wordlist(Embleton1991;McMahon&McMahon2003).Ifundetectedborrowinghasbiasedourtreetopologyanddivergencetimeestimatesthenthe100wordlistmightbeexpectedtoproducedifferentestimates.Toassessthispossibilitywerepeatedthe analysis using only the Swadesh 100word listitems.Figure8.16showstheresultsofthisanalysis.redictably,withasmallerdatasetvarianceintheage
Figure8.16.Frequencydistributionofbasalage
estimatesfromfilteredBayesianMCMCsampleoftreesusingSwadesh100wordlistitemsonly(n=97).
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
16/20
106
Chapter8
estimatesincreased.However,theresultingagerangewasstillconsistentwiththeAnatoliantheoryofIndoEuropeanorigin.Interestingly,themajorityruleconsensustree(showninFig.8.17)isslightlydifferenttothatobtainedfromthefulldataset.ItcontainsaBaltoSlavicIndoIraniangroupandanItaloCelticgroup.Ringeetal.s(2002)compatibilityanalysisalsofound
theseclades.Thelowposterior probability valuesforthesegroupsmeanthatweshouldnotoverinter
pretthecertaintyofthesedeeper relationships, butclearlythepossibilitythatundetectedlexicalborrowing is obscuring some ofthe deeper relationshipswouldrepayfurtheraention. We emphasize thatthis possible borrowingdoesnotappear,however,to affect our timedepthestimates for the root ofthetree.
It is interesting to
notethatwhilstourmethodology produced consistent results using theSwadesh 100 and 200wordlist,Tischlers(1973)gloochronologicalanalysis was affected by thechoiceofwordlist.Tischlergenerated IndoEuropeandivergence times usingpairwisedistancecomparisons between languagesunder the assumption ofconstant rates of lexical
replacement. Using theSwadesh 200word list,hecalculatedthatthecoreIndoEuropean languages(Greek,Italic,BaltoSlavic,G e rm an ic a nd I nd oIranian) diverged around5500 whilst Hiite divergedfromthecommonstockaround8400.Thisis in striking agreementwiththetimingdepictedin Figure 8.8. However,thesamecalculationusing
theSwadesh100wordlist,producedaHiitedivergencetimeofalmost11,000.Otherinferreddivergencetimeswerealsoolder.Tischlerfavouredthe200wordlistresultsbecausetheytendedtobemoreconsistentandwerebasedonalargersamplesize.However,thedisparate100wordlistagesledTischlertoconcludethatthedivergencetimesforHiite(andanumberofotherperipheralIndo
Figure8.17.Majorityruleconsensustree(unfiltered)forSwadesh100wordlistitemsonly.Valuesaboveeachbranchexpressuncertainty(posteriorprobability)inthetreeasapercentage.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
17/20
107
HowOldistheIndoEuropeanLanguageFamily?
Europeanlanguages,includingAlbanian,ArmenianandOldIrish)wereinfactanomalousandheinsteadfavouredanageforIndoEuropeanofbetween5000and6000 years, reflecting the breakup of thecore
languages.HeexplainedtheapparentearlierdivergenceofHiite,Albanian,OldIrishandArmenianasanartefactofborrowingwithnonIndoEuropeanlanguagesorincreasedratesofchange.
11.Conclusion
TheanalyseswehavepresentedherearefarfromthelastwordonthevexedissueofIndoEuropeanorigins.WeexpectthateveryspeciesofscholarandwouldbesavantwhocantakepentohandwillstillbedrawntothequestionofIndoEuropeanorigins.However,incontrasttosomeofthemorepessimisticclaimsofourcritics,wedonotthinkthatestimatingtheageof
theIndoEuropeanlanguagefamilyisanintractableproblem.Someofthesecriticshavearguedthatitishardenoughtogetthetreetypologycorrect,letalone
branchlengthsordivergencetimes.Fromthispointofviewalleffortstoestimatedatesshouldbeabandoneduntilwecangetthetreeexactlyright.Wethinkthatwouldbeabigmistake.Itwouldprematurelycloseofflegitimatescientificinquiry.Theprobabilityofgeingtheoneperfectphylogenyfromthe6.6610152possibleunrootedtreesfor87languagesisrathersmall.Fortunatelywedonotneedtogetthetreeexactlycorrectinordertomakeaccuratedateestimates.UsingtheBayesianphylogeneticapproachwecancalculatedivergencedatesoveradistributionofmostprobable
trees, integrating outuncertaintyin thephylogeny.Weacknowledgethatestimatinglanguagedivergencedatesisdifficult,butmaintainitispossibleifthefollowingconditionsaresatisfied:a) adatasetofsufficientsizeandqualitycanbeas
sembledtoenablethetreeanditsassociatedbranchlengthstobeestimatedwithsufficientaccuracy;
b) mostoftheborrowingisremovedfromthedata;c) anappropriatestatisticalmodelofcharacterevolu
tionisused(itshouldcontainsufficientparameterstogiveaccurateestimatesbutnotbeoverparameterized);
d) multiplenodesonthetreearecalibratedwithreliableageranges;
e) uncertaintyintheestimationoftreetopologyandbranchlengthsareincorporatedintotheanalysis;f) variationintherateof linguisticevolutionisac
commodatedintheanalysis.TheanalysesofIndoEuropeandivergencedateswehaveoutlinedabovegoalongwaytomeetingtheserequirements.TheDyenetal.(1997)datasetweusedinouranalysescontainsovertwothousandcarefully
codedcognatesets(conditiona).Dyen etal.excludedknownborrowingsfromthesesets(conditionb).Thetwostate,timereversiblemodelofcognategainsandlosseswithgammadistributedrateheterogeneitypro
ducedaccuratetrees(i.e.congruentwiththeresultsofthecomparativemethodandknownhistoricalrelationships)4(conditionc).Whenthebranchlengthswerecombinedwiththelargenumberofwellcali
bratednodes(conditiond),theestimateddivergencedateswerealsoinlinewithknownhistoricalevents.TheBayesianMCMCapproachallowedustoincorporate phylogenetic uncertainty into our analyses(conditione),andtoinvestigatetheconsequencesofvariationsinthepriors,treerooting,andstringencyincognatejudgements.Finally,ratesmoothingallowedustoestimatedivergencedateswithouttheassumptionofastrictglooclock(conditionf). Wechallengeourcriticstofindanypaperonmoleculardivergencedatesthat
usesasmanycalibrationpoints,investigatestheimpactofsomanydifferentassumptions,orgoestothesamelengthstovalidateitsresults.
InthewordsofW.S.Holt,historyisadamndimcandle overa damn dark abyss.Although weseereasonforcarefulscholarshipwhenaemptingtoestimatelanguagedivergencedates,weseenojustificationforpessimismhere.FarfromdancingaroundthequestionofIndoEuropeanoriginslikemothsaroundaflame,withthelightofcomputationalphylogeneticmethodswecanilluminatethepast.
Notes1. Tenmillion postburnintreesweregeneratedusing
theMrBayes(Huelsenbeck&Ronquist2001).Toensurethatconsecutivesampleswereindependent,onlyevery10,000thtreewassampledfromthisdistribution,producingasamplesizeof1000.
2. Theproposedborrowingswere:inRomanceear,fire,liver,count,eat,headandnarrow;inGermanicleaf,sharpandthink;andinIndickill,night,play,suck,flowerandliver.WenotethatthislistwasnotintendedbyGarretobeacomprehensiveaccountofallpossibleborrowings.
3. Borrowings that are unlikelyto favour inferredlanguagechangeatthebaseofasubgrouporthatwouldfavourinferredlanguagechangewithinasubgroupare:inRomanceear,head,narrow;inGermanicleaf;andinIndicflowerandliver.
4. Wedo,however,agreethatthequestionofmodelspeci
ficationwouldrepayfurtherinvestigation(seeNicholls&GrayChapter14thisvolume;agelChapter15thisvolume;Atkinsonetal.2005).
References
Adams,D.Q.,1999.DictionaryofTocharianB.(LeidenStudiesinIndoEuropean10.)Amsterdam:Rodopi.Avail
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
18/20
108
Chapter8
ableviaonlinedatabaseatS.Starostin&A.Lubotsky(eds.),DatabaseQuerytodictionaryofTocharianB.hp://iiasnt.leidenuniv.nl/ied/index2.html.
Atkinson,Q.D.&R.D.Gray,2006.Areaccuratedatesanintractableproblemforhistoricallinguistics?inMappingourncestry:PhylogeneticMethodsinnthropologyandPrehistory,eds.C.Lipo,M.OBrien,S.Shennan&M.Collard.Chicago(IL):Aldine,26996.
Atkinson,Q.D.,G. Nicholls,D. Welch&R.D.Gray,2005.Fromwordstodates:waterintowine,mathemagicorphylogeneticinference?TransactionsofthePhilologicalSociety103(2),193219.
Bateman,R.,I.Goddard,R.OGrady,V.Funk,R.Mooi,W.Kress&.Cannell,1990.Speakingofforkedtongues:Thefeasibilityofreconcilinghumanphylogenyandthehistoryoflanguage.Currentnthropology31,124.
Bellwood,.,1991.TheAustronesiandispersalandtheoriginoflanguages.Scientificmerican265,8893.
Bellwood, ., 1994.An archaeologists view of languagemacrofamily relationships. Oceanic Linguistics 33,
391406.Bergsland,K.&H.Vogt,1962.Onthevalidityofgloochronology.Currentnthropology3,11553.
Blust,R.,2000.Whylexicostatisticsdoesntwork:theUniversal Constant hypothesis and the Austronesianlanguages,inTimeDepthinHistoricalLinguistics,eds.C.Renfrew,A.McMahon&L.Trask.(apersintherehistory of Languages.) Cambridge: McDonaldInstituteforArchaeologicalResearch,31132.
Bryant,D.,F.Filimon& R.D.Gray,2005.Untanglingourpast:acificselement,phylogenetictreesandAustronesianlanguages,inTheEvolutionofCulturalDiversity:Phylogeneticpproaches,eds.R.Mace,C.Holden&S.Shennan.London:UCLress,6985.
Burnham,K..&D.R.Anderson,1998.ModelSelectionandInference: a Practical InformationTheoretic pproach.
NewYork(NY):Springer.Campbell,L.,2004.HistoricalLinguistics:anIntroduction.2nd
edition.Edinburgh:EdinburghUniversityress.CavalliSforza,L.L.,.Menozzi&A.iazza,1994.TheHis
toryandGeographyofHumanGenes.rinceton(NJ):rincetonUniversityress.
Clackson,J.,2000.TimedepthinIndoEuropean,inTimeDepth in Historical Linguistics, eds. C. Renfrew, A.McMahon & L. Trask. (apers in therehistory ofLanguages.)Cambridge:McDonaldInstituteforArchaeologicalResearch,44154.
Devoto,G.,1962.OriginiIndeuropeo.Florence:InstitutoItalianodireistoriaItaliana.
Diakonov,I.M.,1984.OntheoriginalhomeofthespeakersofIndoEuropean. Sovietnthropologyandrchaeology23,587.
Diamond,J. & . Bellwood, 2003.Farmers andtheirlanguages:thefirstexpansions.Science300,597.
Dyen,I.,J.B.Kruskal&.Black,1992.nIndoeuropeanClassification:aLexicostatisticalExperiment. (Transactions82(5).)hiladelphia(A):AmericanhilosophicalSociety.
Dyen, I., J.B. Kruskal & . Black, 1997. FILE IEDATA1.Availableathp://www.ntu.edu.au/education/langs/ielex/IEDATA1.
Embleton,S.,1986.StatisticsinHistoricalLinguistics.Bochum:Brockmeyer.
Embleton, S.M., 1991. Mathematical methods of geneticclassification,inSprungfromSomeCommonSource,eds.S.L.Lamb&E.D.Mitchell.Stanford(CA):StanfordUniversityress,36588
Excoffier,L. & Z. Yang,1999. Substitution ratevariationamong sites inmitochondrialhypervariableregionIofhumansandchimpanzees.MolecularBiologyandEvolution16,135768.
Faguy,D.M.&W.F.Doolile,2000.Horizontaltransferofcatalaseperoxidasegenesbetweenarchaeaandpathogenicbacteria.TrendsinGenetics16,1967.
Felsenstein,J.,2004.InferringPhylogenies .Sunderland(MA):Sinauer.
Gamkrelidze,T.V.&V.V.Ivanov,1995. IndoEuropeanandtheIndoEuropeans:aReconstructionandHistoricalnalysisofaProtoLanguageandProtoCulture.Berlin:MoutondeGruyter.
Gimbutas, M., 1973a. Old Europe c. 70003500 , the
earliestEuropeanculturesbeforetheinfiltrationoftheIndoEuropeanpeoples.JournalofIndoEuropeanStudies1,120.
Gimbutas,M.,1973b.ThebeginningoftheBronzeAgeinEuropeandtheIndoEuropeans35002500.JournalofIndoEuropeanStudies1,163214.
Gkiasta,M.,T.Russell,S.Shennan&J.Steele,2003.NeolithictransitioninEurope:theradiocarbonrecordrevisited.ntiquity77,4562.
Glover,I.&C.Higham,1996.NewevidenceforricecultivationinS.,S.E.andE.Asia,in TheOriginsandSpreadofgricultureandPastoralisminEurasia,ed.D.Harris.Cambridge:Blackwell,41342.
Goldman,N.,1993.StatisticaltestsofmodelsofDNAsubstitution.JournalofMolecularEvolution36,18298.
Gray,R.D.&Q.D.Atkinson,2003.Languagetreedivergence
timessupporttheAnatoliantheoryofIndoEuropeanorigin.Nature426,4359.
Guterbock,H.G.&H.A.Hoffner,1986. TheHiAiteDictionaryoftheOrientalInstituteoftheUniversityofChicago.Chicago(IL):TheInstitute.
Hillis,D.M.,1992.Experimentalphylogeneticsgenerationofaknownphylogeny.Science255,58992.
Hillis,D.M.,C.Moritz&B.K.Marble,1996.MolecularSystematics.2ndedition.Sunderland(MA):Sinauer.
Hjelmslev,L.,1958. EssaiduneCritiquede laMethodediteGloAochronologique.ProceedingsoftheThirtysecondInternationalCongressofmericanists,Copenhagen,1956.Copenhagen:Munksgaard.
Hoffner, H.A., 1967.n EnglishHiAite Dictionary. NewHaven(CT):AmericanOrientalSociety.
Holden,C.J.,2002.BantulanguagetreesreflectthespreadoffarmingacrossSubSaharanAfrica:amaximumparsimonyanalysis.ProceedingsoftheRoyalSocietyofLondonSeriesB269,7939.
Holland, B. & V. Moulton, 2003.Consensus networks:amethodforvisualisingincompatibilitiesincollectionsoftrees,inlgorithmsinBioinformatics,WBI2003 ,eds.G. Benson & R. age. Berlin: SpringerVerlag,16576.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
19/20
109
HowOldistheIndoEuropeanLanguageFamily?
Huelsenbeck,J..&F.Ronquist,2001.MRBAYES:Bayesianinferenceofphylogeny. Bioinformatics17,7545.
Huelsenbeck,J..,F.Ronquist,R.Nielsen&J..Bollback,2001.Bayesianinferenceofphylogenyanditsimpactonevolutionarybiology. Science294,231014.
Jukes, T.H. & C.R. Cantor, 1969. Evolution of proteinmolecules,inMammalianProteinMetabolism,vol.3,ed.M.N.Munro.NewYork(NY):Academicress,21132.
Kuhner,M.K.&J.Felsenstein,1994.Asimulationcomparisonofphylogenyalgorithmsunderequalandunequalevolutionaryrates.MolecularBiologyandEvolution 11,45968.
Kumar, V.K., 1999.Discovery ofDravidian as theCommonSourceofIndoEuropean.RetrievedSept.27th2002fromhp://www.datanumeric.com/dravidian/.
Levins,R.,1966.Thestrategyofmodelbuildinginpopulationbiology,mericanScientist54,42131.
Mallory,J..,1989.InSearchoftheIndoEuropeans:Languages,rchaeologyandMyth.London:Thames&Hudson.
McMahon,A.&R.McMahon,2003.Findingfamilies:quantitativemethodsinlanguageclassification.TransactionsofthePhilologicalSociety101,755.
Metropolis,N., A.W. Rosenbluth, M.N.Rosenbluth,A.H.Teller&E.Teller,1953.Equationsofstatecalculationsbyfastcomputingmachines.JournalofChemicalPhysics21,108791.
Oe,M.,1997.ThediffusionofmodernlanguagesinprehistoricEurasia,inrchaeologyandLanguage,eds.R.Blench&M.Spriggs.London:Routledge,7481.
agel,M.,1997.Inferringevolutionaryprocessesfromphylogenies.ZoologicaScripta26,33148.
agel,M.,1999.Inferringthehistoricalpaernsofbiologicalevolution.Nature401,87784.
agel, M., 2000. Maximumlikelihood models for glottochronologyandforreconstructinglinguisticphy
logenies,inTimeDepthinHistoricalLinguistics, eds.C.Renfrew,A.McMahon&L.Trask.(apersintherehistoryofLanguages.)Cambridge:TheMcDonaldInstituteforArchaeologicalResearch,41339.
Renfrew,C.,1987.rchaeologyandLanguage:thePuzzleofIndoEuropeanOrigins.London:Cape.
Rexov,K.,D.Frynta&J.Zrzavy,2003.Cladisticanalysisoflanguages:IndoEuropeanclassificationbasedonlexicostatisticaldata.Cladistics19,12027.
Ringe,D.,n.d.rotoIndoEuropeanWheeledVehicleTerminology.Unpublishedmanuscript.
Ringe,D.,T.Warnow&A.Taylor,2002.IndoEuropeanandcomputationalcladistics.TransactionsofthePhilologicalSociety100,59129.
Rosser,Z.H.,T.Zerjal,M.E.Hurles etal.,2000.Ychromosomal diversity in Europe is clinal and influencedprimarily by geography, rather than by language.mericanJournalofHumanGenetics67,152643.
Sanderson,M.,2002a.R8s,nalysisofRatesofEvolution,version1.50.hp://ginger.ucdavis.edu/r8s/
Sanderson,M.,2002b.Estimatingabsoluteratesofevolution and divergence times: a penalized likelihoodapproach.MolecularBiologyandEvolution19,1019.
Steel,M.,M.Hendy&D.enny,1988.Lossofinformation
ingeneticdistances.Nature333,4945.Swadesh, M., 1952. Lexicostatistic dating of prehistoricethniccontacts.ProceedingsofthemericanPhilosophicalSociety96,45363.
Swadesh,M.,1955.Towardsgreateraccuracyinlexicostatisticdating.InternationalJournalofmericanLinguistics21,12137.
Swofford,D.L.,G.J.Olsen,.J.Waddell&D.M.Hillis,1996.hylogeneticInference,inMolecularSystematics,eds.D.M.Hillis, C. Moritz & B.K.Marble.2nd edition.Sunderland(MA):Sinauer,407514.
Tischler,J.,1973.GloAochronologieundLexicostatistik.Innsbruck:InnsbruckerVerlag.
Tischler, J., 1997. HethitischDeutschesWorterverzeichnis.Dresden:robedruck.
Trask, R.L., 1996.Historical Linguistics. New York (NY):
Arnold.Watkins,C.,1969.IndogermanischeGrammatikIII/1.Geschich
tederIndogermanischenVerbalflexion. Heidelberg:Carl
WinterVerlag.
-
8/7/2019 AtkinsonGray2006a How old is IE lang Fam
20/20