visualizing verbal culture - university of...

41
Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg Institue for Advanced Studies Digital Humanities: A Dialogue between Visual Arts and Sciences Venice 14-16 Oct. 2013 John Nerbonne j.nerbonne@rug 1/38

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Visualizing Verbal Culture

John Nerbonnej.nerbonne@rug

University of Groningen and Freiburg Institue for Advanced Studies

Digital Humanities: A Dialogue between Visual Arts and SciencesVenice

14-16 Oct. 2013

John Nerbonne j.nerbonne@rug 1/38

Page 2: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Overview

Verbal CulturePopular humanities study

—dialectology, sociolinguisticsMassive variationCounterindicating signals (visualized)

Aggregating signalsVisualizing the influence of geography

Seguy’s curve (distance)

A view of standard Italian (Tuscany)Conclusions and prospects

John Nerbonne j.nerbonne@rug 2/38

Page 3: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Verbal culture

When we speak, we don’t communicate just contentWe likewise provide signals of identity

Geographic provenance—boot vs. -trunk

Age, sex/gender, social classHip or traditional

We adopt these consciously but also unconsciouslySocial contactLocality

John Nerbonne j.nerbonne@rug 3/38

Page 4: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Verbal culture

When we speak, we don’t communicate just contentWe likewise provide signals of identity

Geographic provenance—boot vs. -trunk

Age, sex/gender, social classHip or traditional

We adopt these consciously but also unconsciouslySocial contactLocality

John Nerbonne j.nerbonne@rug 3/38

Page 5: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Verbal culture

When we speak, we don’t communicate just contentWe likewise provide signals of identity

Geographic provenance—boot vs. -trunk

Age, sex/gender, social classHip or traditional

We adopt these consciously but also unconsciouslySocial contactLocality

John Nerbonne j.nerbonne@rug 3/38

Page 6: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

One problem — variability

Pronunciations are very variable— 87 different pronunciations of ich in the PAD

1 5Ic 5Ic˜

5¯Ic QI–k QIk @IS >@

˜Ig c EI–S

¯Ec˜k E–g E

˙Icff

E˙IS¯E

˙Ik Ek Ekh I I: IP Ic Ic

ffIc¯

IG IGff

IS ISff

IS¯I

¯c I

¯c¯

I¯G I

¯g I

¯k I

¯k. I

¯C I

¯ý I

˚k I–c I–g I–g. I–j I–k

I–C I–x I˙

I˙c¯

I˙: I

˙:c I

˙c I

˙X I

˙g I

˙g. I

˙k I

˙C I

˙ý Ig

Ij Ij˜

Ik Ikh IC Ixff Yc¯

Yý e >e¯IG e–

>Pk e– c e–g e

˙S—

>cj e

˙c e

˙G e

˙g e

˙j e

˙C eg ek e

>kx˜

i i: i:c i:c˜

ici– i–:

>jc i–k

John Nerbonne j.nerbonne@rug 4/38

Page 7: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

A second problem

We receive noisy signals of provenance.

front/low V in Haus [p] (dark) vs. [>pf] [t] vs. [>ts] [k] vs. [x(c)]

“non-overlapping isoglosses” —visualized using Tufte’s principle of“small multiples”

John Nerbonne j.nerbonne@rug 5/38

Page 8: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Isoglosses seldom overlap

aggregate [S] (dark) vs. s [z] (dark) vs. [s] N d/t (dark)2nd shift (non-initially) (initially) vs. deletion

John Nerbonne j.nerbonne@rug 6/38

Page 9: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Isoglosses seldom overlap, more

apical [r] (dark) final [n] drop (dark) medial [t] vs. s init. lenited /g/vs. uvular [ö] vs. retention

John Nerbonne j.nerbonne@rug 7/38

Page 10: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

How to deal with fine detail, noisy signals?

Apply digital techniquesAbstract from fine detail by MEASURINGStrengthen geographic signals by AGGREGATING over a sample

Solve problems of earlier variationist studyNon-overlapping distributionsSelection of features too arbitrary“Atomism” (Coseriu), idiosyncratic words (Bloomfield)

Introduce replicable procedures (DH contribution)Seeking law-like relations in linguistic variation

Sublinear distributions of linguistic variation vs. geography

John Nerbonne j.nerbonne@rug 8/38

Page 11: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Calculating varietal distances

To determine the aggregate distance between varieties:We determine the distance between each pair of varieties for everysingle linguist element (in sample, e.g. dialect atlas)

Perhaps just same (0) vs. different (1)... but we’ve developed more sensitive measures (below)

We sum these distances for every element (hundreds of them)Immediate result: place × place table of varietal differences

Seguy (1971), Goebl (1980s and on), many others

John Nerbonne j.nerbonne@rug 9/38

Page 12: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Aside: more sensitive pronunciation distance measure

Levenshtein distance enables analysis of phonetic transcriptionswithout manual alignment

—move from categorical to numerical analysis of data.One of the most successful methods to determine sequencedistance (Levenshtein, 1964)

biological molecules, software engineering, ...

Levenshtein distance: minimum number of insertions, deletionsand substitutions to transform one string into the otherSyllabicity constraint add: vowels never substitute for consonants

John Nerbonne j.nerbonne@rug 10/38

Page 13: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Example of the Levenshtein distance

mO@lk@ delete @ 1mOlk@ subst. O/E 1mElk@ delete @ 1mElk insert @ 1mEl@k

4

m O @ l k @m E l @ k

1 1 1 1

John Nerbonne j.nerbonne@rug 11/38

Page 14: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Example

Based on Dutch pronunciation data from theGoeman-Taeldeman-Van Reenen-Project data (GTRP; Goemanand Taeldeman, 1996)

We use 562 words for 424 varieties in the Netherlands

Wieling, Heeringa & Nerbonne (2007) An Aggregate Analysis ofPronunciation in the Goeman-Taeldeman-van Reenen-ProjectData. In: Taal en Tongval 59(1), 84-116

Calculating Levenshtein distances yields interesting soundcorrespondences contained in the alignments (more on that later)

Note that a 100-word comparison already yields about 500 soundcorrespondences

John Nerbonne j.nerbonne@rug 12/38

Page 15: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Distribution of sites

John Nerbonne j.nerbonne@rug 13/38

Page 16: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Analytical steps

Obtain the distances between each of the ≈ 90,000 pairs ofvarieties

n.b. this involves 500× 52 segment comparisons≈ 1.1× 109 segment comparisons in totalClear need for digital techniques!

Organize these in a 400× 400 tableSeek groups (dialect areas, social classes) or continuum-likerelations, e.g. by applying further (statistical) DH techniques.

Note that no attention has been paid to geography thus far!

John Nerbonne j.nerbonne@rug 14/38

Page 17: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Multi-Dimensional Scaling interprets distance table

Frisian

Frisian cities, Het Bildt

Westerkwartier

Stellingwerf

Low Saxon

Central Gelderland

Dutch Low Franconian

Flemish Low Franconian

John Nerbonne j.nerbonne@rug 15/38

Page 18: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Visualization indispensable!

Frisian

Frisian cities, Het Bildt

Westerkwartier

Stellingwerf

Low Saxon

Central Gelderland

Dutch Low Franconian

Flemish Low Franconian

Result too complex to be appreciated directly.

John Nerbonne j.nerbonne@rug 16/38

Page 19: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

MDS dimensions→ colors, projected to map

John Nerbonne j.nerbonne@rug 17/38

Page 20: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Interpolated, interpreted maps

John Nerbonne j.nerbonne@rug 18/38

Page 21: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Noisy Clustering

BonnKöln 100

Iversheim56

AachenWinterspelt

55

Odenspiel

56

LohraWittelsberg 58

Allna100

HerbornseelbachOffdilln 100

99

DexbachNiederasphe 100

Rosenthal58

Frohnhausen

100

74

AltenbergSchraden 54

BockelwitzSchmannewitz 97

Linz60

GrünlichtenbergRoßwein 100

69

Lampertswalde

72

JonsdorfRammenau 88

Gersdorf72

65

AltlandsbergLippen 100

Groß Jamno100

Pretzsch

100

Neu Schadow

93

GerbstedtLandgrafroda 100

53

BorstendorfGornsdorf 100

Theuma96

Mockern

55

CursdorfOsterfeld

Wehrsdorf

56

BillingsbachZellingen 66

Altentrüdingen97

BempflingenIggingen 80

Schömberg100

BurgriedenOberhomberg

53

BruchHermeskeil 100

KruftSiebenbach 100

Mastershausen56

57

Hartenfels

56

BüdesheimEisenbach 73

Niedernhausen61

Vielbrunn

56

Lohrhaupten

83

EschelbronnPfaffenrot 83

Niederauerbach85

56

EnsheimMaxweiler

53

EbertshausenExdorf 100

TannWeyhers 100Helmers

100100

EichenhofenHermannsreuth 100

PeterskirchenSchachach 60

Gelting92

LangenbruckOberviehbach 59

PielenhofenTreffelstein 100

Ulbering

67

Hartenstein

60

KemmernOttowind 100

Schauenstein100

Weidenbach

71

Nürnberg

65

63

Oberau

62

Klafferstraß

70

Pöttmes

7875

MaibrunnRamsau 93

79

EinöllenUngstein 59

HorheimSeelbach 62

Endenburg−Lehnacker52

EngelsbachSchellroda 100

HönebachRinggau−Röhrda 84

Unterellen

63

Mörshausen

60

GroßwechsungenWieda 99

Groß Ballhausen86

100

Orferode

99

HöchstädtIgling 70

Wildpoldsried96

SchnepfenbachVolkershausen 71

ClausthalKleinbottwar

ObermaiselsteinOberwürzbach

83

AhrbergenWasbüttel 100Brelingen

76

AlberslohHaddorf 100

Lippramsdorf61

BrockhausenEngter 100

60

HohenkörbenWüllen 63

77

AltwarpBreddin

Klein Rossau60

GrünowVietmannsdorf 94

Falkenthal79

99

MirowSchönbeck 99

98

BenninWentorf 91

Groß MohrdorfWolgast 91

Hagen64

Kirch Kogel

69

GresenhorstHerzfeld 97

Jürgenshagen

68

Verchen

68

59

AstfeldFreden 74

GottsbürenOsterhagen 96

71

AtzendorfHundisburg 100

Götz94

JacobsdorfReetz 61

62

Ruhlsdorf

81

Benzingerode

100

JeverWangerooge 57

Barßel81

BremscheidHerdecke 60

HerrentrupReelkirchen 100

HesselteichValdorf 100

9256

DreekeHerßum 66

GroßenwieheSchwabstedt 100

Holmkjer100

Wasbek

65

HammahOiste 52

JesteburgKuhstedt 94

StöckenWarpe 100Adorf

BardenflethDiekhusen

EbstorfEversen

HohwachtHuddestorf

JeetzelOhrdorf

Osterbruch

88

LeuthWemb 100

83

100

Seeks groups in data, enabling comparison to (older) workisolating areas

Note humanistic tradition of comparison to older work

Bootstrap (or noisy) clustering to avoid instability

John Nerbonne j.nerbonne@rug 19/38

Page 22: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Projecting groups to geography

Den Burg

SchiermonnikoogOosterend

Leeuwarden

Grouw

Groningen

Heerhugowaard

Haarlem

Delft

StaverenSteenwijk

Urk

Hattem

Amersfoort

Assen

Emmen

Itterbeck

Lochem

Brugge

Veurne

Middelburg

Gent

Vianen

Zevenbergen

Kalmthout

Mechelen

Groesbeek

Helmond

Venlo

Overpelt

Roeselare

SteenbeekGeraardsbergen Tienen

Kerkrade

Aubel

John Nerbonne j.nerbonne@rug 20/38

Page 23: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

How much does Geography Matter?

DH techniques enable more abstract questionsDependent variable: aggregate varietal distanceIndependent variable: geographical distance, i.e. the chance ofsocial contactStatistical cautions:

1 correlations involving averages are inflated— but we’re interested in the entire varieties (dialects)

2 distances are not independent, so significance may be inflated— Mantel tests

John Nerbonne j.nerbonne@rug 21/38

Page 24: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Inspiration: Jean Seguy

Seguy (1971) La relation entre la distance spatiale et la distancelexicale. Revue de Linguistique Romane 35(138), 335-357:Aggregate variation increases sublinearly with respect togeography

COURSE MOYENNE

Y = 36Vlog(x + 11

so

.0

J

10

1

~ 1. 6 . I) IS 10 1~ 30 3~ .0 .~ 50 55 60 ~ 10 1S 10 is 90 95 100 IDS 110

John Nerbonne j.nerbonne@rug 22/38

Page 25: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Sublinear spread is general

0 100 300 500

0.00

0.10

0.20

Bantu

0 100 200 300 400 500

0.00

00.

002

0.00

4

Bulgaria

0 200 400 600 800

0.04

0.08

0.12

Germany

0 200 600 1000

0.0

0.2

0.4

LAMSAS / Lowman

0 50 100 200 300

0.01

0.03

0.05

0.07

The Netherlands

0 100 200 300 400 500

1.0

2.0

3.0

4.0

Norway

John Nerbonne j.nerbonne@rug 23/38

Page 26: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Aside: Trudgill’s “Gravity hypothesis”

Moon

DeimosPhobos

Venus

Earth

Mars

Sun

According to Trudgill (1972) diffusion follows an inverse square

law, with the consequence that linguistic distance should likewise

increase with the square of the distance. Population size plays

the role of mass.

John Nerbonne j.nerbonne@rug 24/38

Page 27: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Trudgill’s “Gravity hypothesis”

Sublinear aggregate relation incompatible with a quadraticinfluence

J.Nerbonne (2010) Measuring the Diffusion of Linguistic Change. Phil.Transactions of the Royal Society B 365, 3821-3828.

John Nerbonne j.nerbonne@rug 25/38

Page 28: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Large body of dialectometric work—positive aspects

Dutch, German, American English, Norwegian, Swedish,Afrikaans, Sardinian, Tuscan, Catalan, Sino-Tibetan, Chinese,Bulgarian, Bantu, Central Asian (Turkic & Indo-Iranian), ...Development of consistency measure (Cronbach’s α) indictingwhether data set is sufficiently largeNovel reflection, work on validation aimed at assessing degree ofdetection of SIGNALS OF PROVENANCE

Gooskens & Heeringa (2004) Perceptive Evaluation of LevenshteinDialect Distance Measurements using Norwegian Dialect Data.Language Variation and Change 16(3), 189-207.

John Nerbonne j.nerbonne@rug 26/38

Page 29: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Criticisms of dialectometry, esp. Levenshtein-basedwork

Measure is too insensitive, 0/1 segment differencesToo little attention to phonetic/phonological conditioningToo reliant on transcription—what about acoustics?Where is the sociolinguistics? Isn’t variationist linguistics mostlyabout sociolinguistics?“Distance-based” methods yield too little insight into the linguisticbasis of differences (concrete differences lost in the aggregatesums)

—the hint is that it may be all smoke & mirrorsSo what? Isn’t this all just confirming what we knew earlier?

... progress on all fronts, but presentation would take too long—question and discussion period for those interested

John Nerbonne j.nerbonne@rug 27/38

Page 30: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Atlante Lessicale Toscano

Giacomelli et al. (2000)2193 respondents, varying in age, socio-economic status224 settlements, 745 items questionedFocus on lexical variation (vocabulary)

Tuscan (Florentine) source of standard Italian?

John Nerbonne j.nerbonne@rug 28/38

Page 31: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Atlante Lessicale Toscano

Giacomelli et al. (2000)2193 respondents, varying in age, socio-economic status224 settlements, 745 items questionedFocus on lexical variation (vocabulary)

Tuscan (Florentine) source of standard Italian?

John Nerbonne j.nerbonne@rug 28/38

Page 32: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Data collection sites in Tuscany

John Nerbonne j.nerbonne@rug 29/38

Page 33: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Generalized Additive Models

Simon N. Wood (2006) Generalized Additive Models: AnIntroduction with RAllows regresion using combination of predictors, e.g. longitudeand latitudeA more sophisticated notion of geography (than simple distance)

John Nerbonne j.nerbonne@rug 30/38

Page 34: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Aggregate dialect similarity to standard Italian

Darker areas are more similar to standard Italian in vocabulary

John Nerbonne j.nerbonne@rug 31/38

Page 35: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Visualizing Verbal Culture

DH techniques help analyze verbal cultureAbstract from fine detail by measuringStrenghten signals by aggregating

Digital techniques indispensableResultant analyses highly complex difficultVisualization used to understand and communicate analysesDH together with visualization enables more encompassing, butalso more abstract questions.

Try Gabmap! www.gabmap.nl

John Nerbonne j.nerbonne@rug 32/38

Page 36: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Questions?

Thank You!

John Nerbonne j.nerbonne@rug 33/38

Page 37: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

How much does distance influence language?

Area Corr.(l,geo) r2

Gabon Bantu 0.47 0.22Bulgaria 0.49 0.24Germany 0.57 0.32Eastern U.S. 0.51 0.26Netherlands 0.62 0.38Norway 0.41 0.16

Norwegian ling. dist. correlates better w. travel time in 1900 (r = 0.54)Gooskens (2005) Dialectologia et Geolinguistica 13.

— very primitive geography!

John Nerbonne j.nerbonne@rug 34/38

Page 38: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Geography influence on language

Geography accounts for 22− 38% of aggregate linguistic variation.General — sublinear — characterization of relation betweengeographical distance and linguistic differencesLike population geneticists’ “isolation by distance” (Wright, 1943;Malecot, 1955)

John Nerbonne j.nerbonne@rug 35/38

Page 39: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Wrede’s (1926-56) German Dialect Areas

Aachen

Adorf

Ahrbergen

Albersloh

AllnaAltenberg

Altentrüdingen

Altlandsberg

Altwarp

Astfeld Atzendorf

BardenflethBarßel

Bempflingen

Bennin

Benzingerode

Billingsbach

Bockelwitz

BonnBorstendorf

Breddin

Brelingen

Bremscheid

Brockhausen

Bruch

Burgrieden

Büdesheim

Clausthal

Cursdorf

Dexbach

Diekhusen

Dreeke

Ebertshausen

Ebstorf

Eichenhofen

Einöllen

Eisenbach

Endenburg−Lehnacker

Engelsbach

Engter

Ensheim

Eschelbronn

Eversen

Exdorf

Falkenthal

Freden

Frohnhausen

Gelting

Gerbstedt

Gersdorf

Gornsdorf

Gottsbüren

Gresenhorst

Groß Ballhausen

Groß Jamno

Groß Mohrdorf

Großwechsungen

Großenwiehe

Grünlichtenberg

Grünow

Götz

Haddorf

Hagen

Hammah

Hartenfels

Hartenstein

HelmersHerbornseelbach

Herdecke

Hermannsreuth

Hermeskeil

Herrentrup

Herßum

Herzfeld

Hesselteich

Hohenkörben

Hohwacht

Holmkjer

Horheim

Huddestorf

Hundisburg

Höchstädt

Hönebach

Iggingen

Igling

Iversheim

Jacobsdorf

Jeetzel

Jesteburg

Jever

Jonsdorf

Jürgenshagen

Kemmern

Kirch Kogel

Klafferstraß

Klein Rossau

Kleinbottwar

Kruft

Kuhstedt

Köln

LampertswaldeLandgrafroda

Langenbruck

LeuthLinz

Lippen

Lippramsdorf

Lohra

Lohrhaupten

Maibrunn

Mastershausen

Maxweiler

Mirow

Mockern

Mörshausen

Neu Schadow

Niederasphe

Niederauerbach

Niedernhausen

Nürnberg

Oberau

Oberhomberg

Obermaiselstein

Oberviehbach

Oberwürzbach

OdenspielOffdilln

Ohrdorf

Oiste

Orferode

Osterbruch

Osterfeld

Osterhagen

Ottowind

Peterskirchen

Pfaffenrot

Pielenhofen

Pretzsch

Pöttmes

Rammenau

Ramsau

Reelkirchen

Reetz

Ringgau−Röhrda

Rosenthal

Roßwein

Ruhlsdorf

Schachach

Schauenstein

Schellroda

Schmannewitz

Schnepfenbach

Schraden

Schwabstedt

Schömberg

Schönbeck

Seelbach

Siebenbach

Stöcken

Tann

Theuma

Treffelstein

Ulbering

Ungstein

Unterellen

Valdorf

Verchen

Vielbrunn

Vietmannsdorf

Volkershausen

Wangerooge

Warpe

Wasbek

Wasbüttel

Wehrsdorf

Weidenbach

Wemb

Wentorf

Weyhers

Wieda

Wildpoldsried

Winterspelt

Wittelsberg

Wolgast

Wüllen

Zellingen

John Nerbonne j.nerbonne@rug 36/38

Page 40: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Influence of Dialect Areas?

We add to the regression design variables indicating whether twovarieties belong to the same or different dialect areas.Results: Dialect areas contribute substantially to the explanationof aggregate linguistic distance. r2 increases from about 32%(based only on geographic distance) to about 47% (based ongeographic distance and areal differencs).

John Nerbonne (submitted, 2010) How much does Geography InfluenceLanguage Variation? Auer et al. (eds.) Proc. of the Freiburg (FRIAS)language and space workshops. Mouton de Gruyter: Berlin.

John Nerbonne j.nerbonne@rug 37/38

Page 41: Visualizing Verbal Culture - University of Groningennerbonne/outgoing/talks/venice-2013...Visualizing Verbal Culture John Nerbonne j.nerbonne@rug University of Groningen and Freiburg

Motivation Aggregating Signals Pronunciation Geographic Visualizations Generality? Italian and Tuscan Conclusions

Visualizing Verbal Culture

Pure distance models explain 22% - 38% of aggregate linguisticvariation.Areal distinctions are somewhat collinear, but nonethless addsubstantially to simple models, perhaps as much as 50% (moving30% to 45%, for example).Naturally, there is also subdialectal variation (social, sexual,individual), but few systematic data collections.Emerging questions:

What is the linguistic structure of the dialect differences we find?Do typological constraints play a (confounding) role?Can we tease apart geographical and historical explanations, andhow?

Try Gabmap! www.gabmap.nl

John Nerbonne j.nerbonne@rug 38/38