goals to advance understanding of the relationship between geographically-coded data and language...

45

Upload: damon-peters

Post on 29-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 2: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

GoalsGoals

To advance understanding of the relationship between geographically-coded data and language data

To transform our notion of dialect and speech community based on geographical, demographic and social distribution of multiple features

Page 3: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Overview Overview

We know that /æ/ is changing in this region and this time period. Question: How does that change spread over time and space?

Geo-social structures (the gravity model; Trudgill 1974) can trump straight-line geography (the wave model; Schmidt 1872),

Value addition with georeferenced social factors (Britain 2002; Lee & Kretzschmar 1993)

Page 4: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

The issue of The issue of scalescale

The most important related work (e.g. Trudgill, Lee & Kretzschmar, Labov) has focused on vast areas — Grieve et al. use North America.

We start from this position: Language and social structure are local.

Use data that is more representative than ANAE and measure diphthongization

Page 5: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Neighborhoods by Neighborhoods by demographics x distance demographics x distance x linguistic features x linguistic features

Chambers and Trudgill (1998: 178ff): cross-city influence matrix

P=population of geographic center

d=distance between centers

S=index of linguistic similarity

Page 6: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Sociolinguistic Sociolinguistic Literature tell us Literature tell us

…… Language varies …

As individuals speak to one another (locality) Language is a brokered agreement between

humans and used for various ends Both within and across geographic domains

(identity) (historical continuity) (translocational communication) E.g., Blacks share markers with whites within

a location differentiating them from Blacks elsewhere; yet, speakers often share pan-AAE markers

Page 7: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Geolinguistic Geolinguistic Literature tells us Literature tells us

…… Language varies …

By presence/absence of barriers (boundary conditions)

By sphere of influence to immediately smaller locations where similarity and status matters (gravity) E.g., Chicago to Rockford and St. Louis

By large sweeping patterns where distance matters (wave) E.g., CAUGHT~COT merger in US

Page 8: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Social Science Social Science Literature tells us Literature tells us

…… Local knowledge varies …

In a rapidly decaying fashion (rapid decay) E.g., there is a ‘nearness’ factor and not

all data points have equal influence over each other

Multiple factors influence the spread (or not) of local knowledge (regressive covariation) (costly) E.g., cost involved with transferring

information regarding competition and cooperation

Page 9: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Features of ModelFeatures of Model

Locality, identity & historical continuity by community: geographic and social barriers Gender, ethnicity, age, immigration, topography

Gravity & rapid decay: attraction by population centers within proximate range based on population

Regressive covariation & cost: varying weights and multiple solutions by location

Wave & measurable features: known markers that spread

Page 10: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

MethodologyMethodology

Page 11: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

SpeakersSpeakers

20 speakers from WELS and DARE datasets 1870s: 2 1880s & 1890s: 4 + 2 1900s & 1910s: 4 + 2 1920s, 1930s, & 1940s: 3 + 1 + 2

16 Locations in WI

Page 12: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 13: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Idealized ModelIdealized Model

This model accounts for regressive covariation and cost

For some speech knowledge qua behavior in locale , ℓ

Kℓ = βK1 Sℓ + βK2 Gℓ + εK

K is a proxy for knowledge output (acoustic measures)

Sℓ = social factors

Gℓ = geographic factors

Page 14: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

SocietySociety Locality, identity & historical continuity

by community: geographic and social barriers Gender, ethnicity, age, immigration,

topography

Sℓ = βS1 Fℓ + βS2 Eℓ + βS3 log( Lℓ ) + βS4 log (Wℓ ) + εK

F = % of population, foreign born in 1900

E = % of population, black in 1900

L = value of livestock in 1900

W = total manufacturing wages in 1900

Page 15: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

GeographyGeography

Gravity & rapid decay: attraction by population centers within proximate range based on population

The features for the more geographic features can be stated similarly, as

Gℓ = βK1 log( Pℓ ) + βK2 log (Tℓ ) + βK3 Bℓ + εK

P = log (county population per sq. mi.)

T = log (time to Milwaukee per time to Minneapolis)

B = index of public or private transportation costs to MKE

Page 16: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 17: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 18: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 19: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 20: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Geographic MeasuresGeographic Measures

Designed to capture gravity and decay

Population density 1900 population / sq mi in county

Measure of time of transportation log(distance to Mke/distance to Minn) ℓ Negative value is beneficial

Measure of manner of transportation Number of ‘jumps’ in transportation type, and cost of

transportation (0-3) Private is more costly than public Train is more costly than bus

Page 21: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Measure of Measure of Transportation Transportation

Distance Distance

Page 22: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

So, what’s KSo, what’s Kℓℓ??

Ceteris paribus, presence or absence of regional markers /æ/ class of words

Kℓ = βK1 log( VDℓ ) + βK2 log (F1Nℓ ) + βK3 log (F2Nℓ ) + βK4 log (TLℓ ) + βK4 log (Θℓ ) + εK

Speaker variables birth year gender

Page 23: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Vowel MeasuresVowel Measures

Recordings of “Arthur the Rat” Extracted from WELS/DARE recordings Aligned TextGrid for Praat from Penn Aligner Corrected edges of /ae/ and neighboring

segments

Post processing selection Pre-obstruent V > 40 msec in the front of the

vowel space

/æ/ measures from Praat: VD=vowel duration, F1N=nucleus height, F2N=nucleus backness, TL=trajectory length, Θ=trajectory angle

Page 24: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 25: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

ResultsResults

Page 26: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

PreliminariesPreliminaries

Problem 1: acoustic similarity and grouping speakers with respect to birth year and gender

Problem 2: Covariance matrix for Geography

Problem 3: Covariance matrix for Society

Page 27: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

K = Acoustic K = Acoustic similaritysimilarity

Cluster analysis on individual characteristics

First threw out a speaker because outlier on vowel height New N = 19, but from one of the communities

with two speakers

Clusters — but driven by birth year and gender 1. males of all ages 2. females born before 1900 3. females born after 1900

Page 28: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

PreliminariesPreliminaries

Problem 1: acoustic similarity and grouping speakers with respect to birth year and gender

Problem 2: Covariance matrix for Geography

Problem 3: Covariance matrix for Society

Page 29: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Geographic measuresGeographic measures

Recall: two gradient measures Travel time differential to Milwaukee Population density

Linear covariation near significant R2 = 0.15, p=0.056

One potential outlier; would make significant

Selected transportation time Transportation captures density

Page 30: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

PreliminariesPreliminaries

Problem 1: acoustic similarity and grouping speakers with respect to birth year and gender

Problem 2: Covariance matrix for Geography

Problem 3: Covariance matrix for Society

Page 31: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Society measuresSociety measures

Recall 4 measures Urban class, rural class, ethnicity,

immigration

Covary? Rural class with urban class (R2 = 0.19,

p<0.05) Rural covaries with transportation time (R2 =

0.39, p<0.05); urban doesn’t Immigrants with rural class (R2 = 0.48,

p<0.05) and urban class (R2 = 0.41, p<0.05) Ethnicity does not covary urban class or

transportation

Page 32: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Revised (realistic) Revised (realistic) modelmodel

Dep var: Indiv acoustic measures

Ind vars: urban class + ethnicity + transportation time

Weight by speaker class (birth year, gender)

Page 33: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Not significantNot significant

Vowel backness

Vowel height

Angle of trajectory

Page 34: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Significant Significant relation 1relation 1

Duration x urban social class

Page 35: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Significant Significant relation 2relation 2

Trajectory length x transportation time

Page 36: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Whence straight-Whence straight-line distances?line distances?

Longitude is significant for vowel trajectory and almost for duration

Neither latitude nor longitude is significant for the other three measures

Interpretation Bias toward westward settlement patterns For eastward moving CAUGHT~COT expect

inverse relation

Page 37: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 38: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 39: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 40: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

ConclusionsConclusions

Page 41: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

SummarySummary

Clarification of the broad sociolinguistic category of “geography Parametric power: encodes distance and population

Reduces complex matrix of Chambers & Trudgill Broadly reconceputalizes the notion of

“geography”

Lx measures = urban class + ethnicity + transportation time Weighted by age, gender

Keeps the focus local

Page 42: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Geographic Geographic influence on influence on

language variation?language variation? Testing to see if georeferenced data is better

than straightline distance Knew this going in, but need to demonstrate

this because current studies continue to ignore this

Some features do fall out by longitude (duration, trajectory length); how many other studies are due to source of change being at the statistical corner of the analysis space?

Transportation time should overcome this problem because it doesn’t matter which direction one comes from.

Page 43: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Future workFuture work

Convert county data to more local data (April 2, 2012??) Will permit more robust GIS computation

Better treatment of biases Ethnicity Immigration Geography

Build continuity with new data collections

Page 44: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech
Page 45: Goals  To advance understanding of the relationship between geographically-coded data and language data  To transform our notion of dialect and speech

Thanks!Thanks!

UW Graduate SchoolUW Graduate SchoolThe Dictionary of American The Dictionary of American

Regional EnglishRegional EnglishWisconsin Englishes ProjectWisconsin Englishes Project

(Luke Annear, Trini Stickle, Nick (Luke Annear, Trini Stickle, Nick Williams)Williams)