![Page 1: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/1.jpg)
In Contextualizing Historical Lexicology, Helsinki May 15-17, 2017
Basic vocabulary and the phylogenetic approach to the study of Uralic language history
Michael Rießler1, Mervi de Heer2
Terhi Honkola3
Unni-Päivä Leino4, Kaj Syrjänen4 Outi Vesakoski3
1 Univ of Freiburg, Germany 2 Univ of Uppsala, Sweden 3 Univ of Turku, Finland 4 Univ of Tampere, Finland
![Page 2: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/2.jpg)
Senior staff Terhi Honkola1
Unni-Päivä Leino2
Urho Määttä2 Luke Maurits1
Jenni Santaharju3 Outi Vesakoski 1
Niklas Wahlberg
BEDLAN team funded by the Kone Foundation
BEDLAN 2009-2013 PI Urho Määttä URALEX 2013-2016 PI Unni-Päivä Leino SumuraSyyni 2014-2016 PI Outi Vesakoski Kippo 2017-2020 PI Unni-Päivä Leino AikaSyyni 2017-2020 PI Outi Vesakoski
Collaboration & visitors
Rogier Blokland4 Michael Dunn4 Mikko Heikkilä2
Michael Riessler5 Harri Tolvanen1
Sanni Översti3
1
2 3
4
5
Doctoral students Mervi de Heer4 Timo Rantanen1
Kaj Syrjänen2
Assistants Hilkka Ahola1
Jaakko Helke3 Timo Rantakaulio3 Ilpo Tammi1
Geography Linguistics
Mathematics Biology
![Page 3: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/3.jpg)
Themes of the presentation
1. What is “phylogenetic linguistics” 2. Lexical data 3. Uralic family analysed with phylogenetic methods & FAQ 4. Added value of phylogenetic linguistics
![Page 4: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/4.jpg)
Phylogenetic linguistics
CM vs statistical approaches: hypothesis testing Exploration of linguistic data with the help of phylogenetic methodology
Based on the idea that linguistic change can be regarded as a type of generalized evolution • generalized evolution ≠ biological evolution
A large selection of computational methods based on varying principles • Often adopted from phylogenetics • Developed to find signal/pattern from large data sets
![Page 5: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/5.jpg)
Phylogenetic linguistics
Phylogenetic linguistics is not equal to ● Lexicostatistics ≈ Clustering languages based on distances calculated from meaning
lists
● Glottochronology ≈ Use of lexicostatistical distances to infer chronological dates
● Mass comparisons ≈ Subjective inspection of large data
![Page 6: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/6.jpg)
Data in phylogenetic linguistics
E.g. Lexical data ● Often basic vocabulary & cognate coding (root-meaning forms) ● Most available data type ● Data hygienity differs
Typological (structural) data ● Syntactic, morphological, phonological ● Collection on-going (e.g. world languages, Uralic languages) ● Hygienity: Subjective decision of traits? ● Limited design space of typological traits!
![Page 7: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/7.jpg)
Different basic vocabulary lists available: ● Swadesh 100 and 200 lists ● Leipzig-Jakarta list 100 meanings (WOLD 1-100) ● BEDLAN: Less stable vocabulary (WOLD 401-500)
WOLD= World loanword database (Haspelmath and Tadmor 2009) Basic vocabulary in WOLD ● wide-spread concepts ● resistant to borrowing ● unlikely to be replaced ● morphologically simple
213 linguistic traits = map sheets Lexical data in phylogenetic linguistics
Meaning Sw200 Sw100 WOLD 1-100
1 all X X —
2 and X — —
3 animal X — —
4 ashes X X X
5 at X — —
6 back X — X
7 bad X — —
8 bark X X —
9 because X — —
10 belly X X —
11 big X X X
12 bird X X X
13 bite X X X
14 black X X X
15 blood X X X
16 blow X — X
17 bone X X X
18 breast X X X
19 breathe X — —
20 burn X X X
21 child X — —
22 claw X X —
23 cloud X X —
24 cold X X —
25 come X X X
![Page 8: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/8.jpg)
Data collection initiated by the BEDLAN group in 2009
26 Uralic languages, 313 meanings (available on request)
Collection described in detail in:
Syrjänen et al. 2013 Shedding more light on language classification using basic vocabularies
and phylogenetic methods (Diachronica 30:3)
Lehtinen et al. 2014: Behind Family Trees. Secondary Connections in Uralic Language Networks
(Language Dynamics and Change 4)
213 linguistic traits = map sheets Uralic data
![Page 9: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/9.jpg)
213 linguistic traits = map sheets Uralic languages collected
Not yet included:
Ludic, Võru, Lule/Akkala/Ter
Saami, Moksha, Enets, Nenets,
Kamas
Map source:
Geographic database of Uralic languages
• Timo Rantanen + BEDLAN
• Jussi Ylikoski
• Language experts: Authors of the
Oxford Handbook of Uralic languages
![Page 10: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/10.jpg)
Data hygienity
• Uniform criteria for selection of words and cognates (single person work)
• Multiple equivalents for a meaning when relevant
• No cognate hunting!
• 17 of 26 languages checked by an expert and / or native speaker of the languages→
Refining in progress
Words collected from bilingual dictionaries
Etymological relationships and cognate coding based on literature • E.g. Itkonen & Kulonen 1992-2000, Rédei 1988-1991, Sammallahti 1988, Janhunen 1997 (Álgu-
database)
213 linguistic traits = map sheets Collection of Uralic data
![Page 11: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/11.jpg)
FISH WATER EAR
1 0 1 0 0 ? ? ?
1 0 0 1 0 1 0 0
1 0 0 1 0 1 0 0
1 0 0 1 0 1 0 0
1 0 0 1 0 1 0 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 0 0 1 0 0
0 1 1 0 0 1 0 0
0 1 1 0 0 1 0 0
1 0 1 0 0 1 0 0
1 0 0 0 1 1 0 0
1 0 1 0 0 0 0 1
Coding cognate relationships
FISH WATER EAR
SaaS "guelie" etc.
KomiZ "ćeri" etc.
Fin "vesi" etc.
SaaS "tjaetsie" etc.
KhaV "jəŋk"
SaaS "bieljie" etc.
Fin "korva" etc.
NenT "xa" etc.
Proto-Uralic (outgroup) *kala –
*weti – – [Not rec'able] [Not
rec'able]
[Not
rec'able]
South Saami guelie – – tjaetsie – bieljie – –
North Saami guolli – – čáhci – beallji – –
Inari Saami kyeli – – čääci – pelji – –
Kildin Saami kūll’ – – čāʒ’ – piellj – –
Standard Finnish kala – vesi – – – korva –
Ingrian kala – vezi – – – korva –
Western Votic kaлa – vesi – – – ke̮rv –
Standard Estonian kala – vesi – – – kõrv –
Võro South Estonian kala – vesi – – – kõrv –
Courland Livonian kalà – veiʾž ~ veʾžʹ – – – kùora –
Erzya kal – vedʹ – – pilʹe – –
Meadow Mari kol – βüt – – pə̑lə̑š – –
Komi-Zyrian – ćeri va – – pelʹ – –
Udmurt – ćori̮g vu – – pelʹ – –
Hungarian hal – víz – – fül – –
Vakh-Vasyugan Khanty kul – – – Jəŋk pəl – –
Tundra Nenets xalya – yiq – – – – xa
…or better root–meaning forms (Chang et al. 2015) as absence-presence binary matrix
(pics by J. Lehtinen)
![Page 12: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/12.jpg)
Methods in phylogenetic linguistics
A large selection of computational methods based on varying principles ● Character-based and distance-based methods
Character-based methods ● Parsimony (smallest number of evolutionary changes) ● Maximum Likelihood (find the best tree and model parameters) ● Bayesian statistics (produce a distribution of likely trees and model parameters) BEDLAN work so far mostly Bayesian statistics
FISH WATER EAR
1 0 1 0 0 ? ? ?
1 0 0 1 0 1 0 0
1 0 0 1 0 1 0 0
1 0 0 1 0 1 0 0
1 0 0 1 0 1 0 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 0 1 0
1 0 1 0 0 1 0 0
1 0 1 0 0 1 0 0
0 1 1 0 0 1 0 0
0 1 1 0 0 1 0 0
1 0 1 0 0 1 0 0
1 0 0 0 1 1 0 0
1 0 1 0 0 0 0 1
Cognates = characters
![Page 13: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/13.jpg)
213 linguistic traits = map sheets Language data in numbers?
Basic vocabulary + MrBayes Syrjänen et al. 2013
Kulonen 2002 Korhonen 1981
Works in Uralic languages, as seen from traditional results.
![Page 14: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/14.jpg)
Potential • Data readily available • Builds upon observations from an earlier research tradition • Datasets are usually large enough
Two important aspects related to evaluating (un)certainty ● Possibility to compare likelihoods of alternative scenarios of linguistic history ● Inferring the outcome: Likelihood of the tree / network and parts of it
Challenges • Linguistic: Diverse material (e.g. descriptive vs. normative sources) • Computational: Better methods to model different types of linguistic change needed • Cultural: Unorthodox approach in historical-comparative linguistic
213 linguistic traits = map sheets Usage of lexical data in general
![Page 15: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/15.jpg)
• Tree figures are simplifications of all the alternative trees produced by the algorithm • (Un)certainty (variation between the trees) condenced into posterior probability values
Example: Timed tree of Uralic family with all 313 meanings and Bayesian BEAST algorithm with restricted clock based on 2 calibration points (PRIORS) (Saami 1300 YBP, 30-y stdev, Samoyed languages 2030, 60-y stdev). Manuscript.
213 linguistic traits = map sheets Measuring uncertainty & usage of priors
0,47
![Page 16: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/16.jpg)
• Networks illustrate the secondary contacts, while trees show inheritance. Example: Distance-based NeighbourNet and character-based MrBayes tree with a data of low amount of known borrowings (149 meanings) in Lehtinen et al. 2014
213 linguistic traits = map sheets Language lineaging as tree models
![Page 17: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/17.jpg)
213 linguistic traits = map sheets Loan words – a problem?
MrBayes analyses for 1. more stable (WOLD 1-100) 2. less stable (WOLD 401-500) basic vocabulary Lehtinen et al. 2014
• Removing known loans retains the old, unattested loans. • Horizontal transfer IS essential part of language lineaging. • Current models don’t differentiate between inherited cognates and lost & replaced root forms • How big of a problem? 100 most stable in WOLD list Less stable
![Page 18: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/18.jpg)
Added value of phylogenetic linguistics?
Technical issues: Objective handling of large data sets (Un)certainty (posterior propabilities, model comparison) of the inference Flexible analyses (model can be adjusted as needed) Phylogenetic modelling allows: ● Making trees without prior assumptions (only data talks) ● Using earlier knowledge as “priors” in evolutionary modelling (earlier knowledge and
data talks) ● Running the model without data: Do priors talk only?
![Page 19: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/19.jpg)
Added value of phylogenetic linguistics
Contextual issues: Aim is not to create a new linguistic paradigm, but add to existing paradigms Possibility to test the probability of alternative hypothesis From basic research to “applied historical linguistics” • Data and results readily usable for other disciplines • Language history as a well-studied approximation of human history • -> Stronger role in studies of holistic human prehistory?
![Page 20: Basic vocabulary and the phylogenetic approach to the ... · Methods in phylogenetic linguistics Cognates ... (find the best tree and model parameters) Bayesian statistics (produce](https://reader033.vdocuments.net/reader033/viewer/2022053023/60568430429d6064c56d3769/html5/thumbnails/20.jpg)
Acknowledgements