15 lines representing a bull traditional statistics assumes data is independent comparative methods

30
Andrew Meade [email protected] University of Reading

Upload: mia-blackburn

Post on 28-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Andrew [email protected] of Reading

Page 2: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 3: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

15 linesrepresenting a bull

Page 4: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 5: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Traditional statisticsAssumes data is independent

Comparative methods

Page 6: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 7: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 8: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

EnglishFish

DanishFisk

DutchVisch

Fish Ryba

CzechRyba

Russian Ryba

BulgarianRiba

23 other languages34other languages

Page 9: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

1 3517

Average 17

1 “Who”, “Three”

35 “Person”, “Dirty”

Page 10: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

English here sea (A) water when

German hier see, meer (A,B) wasser

wenn

French ici mer (B) eau

quand

Italian qui, qua mare (B) acqua

quando

Greek edo thalasa (C) nero

pote

Hittite ka aruna- (D) watar

kuwapi

Languages Meanings

sea (A) meer (B) thalasa (C) aruna- (D)

English 1 0 0 0

German 1 1 0 0

French 0 1 0 0

Italian 0 1 0 0

Greek 0 0 1 0

Hittite 0 0 0 1

Page 11: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Q01

0Non cognate

1Cognate

Q10

0 10 1

0 0 0 0

Time1000 years

Page 12: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Results = Data + Method

Page 13: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Most probableRandom tree -58204 Log units4.1 x 1014107

Infinite number of poor trees

Page 14: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Out g

roup

Gre

ek

Ind

o-Ira

nia

n

Sla

vic

Germ

anic

Celtic

Rom

ance

Page 15: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

“Name”, 3 cognate classesClass A, Gypsy (Alav), Persian (Esm)Class B, Latvian (Vards), Lithuanian (Vardas)Class C, All the rest, Hindi (Nam), Greek (Onoma), Italian (Nome)

Class A

Class B Class C

B AA B

C A

A C

B C

C B

B A, C B, ectThe estimated instantiations transition rate

To many parameters, not enough data

Page 16: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

2 cognate classes

Slow rate Fast rate

Class 1

Class 2

Page 17: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

“Red”“Salt”

“Five”

Page 18: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Mean = 3.05 1.82Median = 2.74Min. = 0.09Max = 9.27

100 fold difference

Mean rates for the 200 words

Slow‘two’, ‘who’, ‘one’, ‘night’, ‘to die’

Fast‘dirty’, ‘to turn’, ‘to stab’,

Page 19: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Word Half life50% chance of the word being replaced by a non-cognate form

Years

Mean 5260

Median 2530

Min 750

Max 76530

Based on IE being 8000 years

Page 20: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

I-E tree showing variation in rates of lexical replacement, per 10k years

“One” 0.43 “Ear” 0.88 “Sand” 4.5

ROMANCE

GERMANIC

GREEK

GERMANIC

SLAVIC

INDO-IRANIAN

Page 21: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 22: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 23: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Spoken word frequency Spoken word frequency British National CorpusBritish National Corpus

0

50

100

150

200

250

300

350

Co

un

t

1 1.5 2 2.5 3 3.5 4 4.5

log(10) of spoken word frequency per million

N = 4840 wordsmean = 194geometric mean = 35.94median = 25

Page 24: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Distribution of frequency of word use(20-100 million words)

Most words used < 100 times per million

Page 25: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

r=0.87 r=0.88

r=0.87Frequent of use is very stable thru out IE

Page 26: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Frequency vs rate of lexical evolution

r=-0.37 r=-0.35

r=-0.41 r=-0.32

Page 27: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Parts of speechconjunctions ----prepositions ----adjectives ----verbs ----nouns ----special adverbs----pronouns ----numbers ----

R2=0.50 R2=0.48

R2=0.48R2=0.48

Numbers, pronouns, special adverbs

Stronger selection?

Page 28: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 29: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods
Page 30: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods

Attribute Genetic systems Languages

discrete units nucleotides, genes,individuals

words and other linguisticelements

replication transcription teaching, learning, imitation

dominant mode(s) ofinheritance

parent-offspring parent-offspring,generational (includingteaching)

horizontal transmission many mechanisms (e.g.,hybridisation, viruses,transposons, insects)

borrowing

mutation many mechanisms (e.g.,slippage, unequal crossingover, point mutations andfaulty repair)

mistakes, vowel shifts,innovation

selection of favouredvariants

fitness differences amongalleles

societal trends

Some similarities between linguistic and genetic systems