goals to advance understanding of the relationship between geographically-coded data and language...
TRANSCRIPT
GoalsGoals
To advance understanding of the relationship between geographically-coded data and language data
To transform our notion of dialect and speech community based on geographical, demographic and social distribution of multiple features
Overview Overview
We know that /æ/ is changing in this region and this time period. Question: How does that change spread over time and space?
Geo-social structures (the gravity model; Trudgill 1974) can trump straight-line geography (the wave model; Schmidt 1872),
Value addition with georeferenced social factors (Britain 2002; Lee & Kretzschmar 1993)
The issue of The issue of scalescale
The most important related work (e.g. Trudgill, Lee & Kretzschmar, Labov) has focused on vast areas — Grieve et al. use North America.
We start from this position: Language and social structure are local.
Use data that is more representative than ANAE and measure diphthongization
Neighborhoods by Neighborhoods by demographics x distance demographics x distance x linguistic features x linguistic features
Chambers and Trudgill (1998: 178ff): cross-city influence matrix
P=population of geographic center
d=distance between centers
S=index of linguistic similarity
Sociolinguistic Sociolinguistic Literature tell us Literature tell us
…… Language varies …
As individuals speak to one another (locality) Language is a brokered agreement between
humans and used for various ends Both within and across geographic domains
(identity) (historical continuity) (translocational communication) E.g., Blacks share markers with whites within
a location differentiating them from Blacks elsewhere; yet, speakers often share pan-AAE markers
Geolinguistic Geolinguistic Literature tells us Literature tells us
…… Language varies …
By presence/absence of barriers (boundary conditions)
By sphere of influence to immediately smaller locations where similarity and status matters (gravity) E.g., Chicago to Rockford and St. Louis
By large sweeping patterns where distance matters (wave) E.g., CAUGHT~COT merger in US
Social Science Social Science Literature tells us Literature tells us
…… Local knowledge varies …
In a rapidly decaying fashion (rapid decay) E.g., there is a ‘nearness’ factor and not
all data points have equal influence over each other
Multiple factors influence the spread (or not) of local knowledge (regressive covariation) (costly) E.g., cost involved with transferring
information regarding competition and cooperation
Features of ModelFeatures of Model
Locality, identity & historical continuity by community: geographic and social barriers Gender, ethnicity, age, immigration, topography
Gravity & rapid decay: attraction by population centers within proximate range based on population
Regressive covariation & cost: varying weights and multiple solutions by location
Wave & measurable features: known markers that spread
MethodologyMethodology
SpeakersSpeakers
20 speakers from WELS and DARE datasets 1870s: 2 1880s & 1890s: 4 + 2 1900s & 1910s: 4 + 2 1920s, 1930s, & 1940s: 3 + 1 + 2
16 Locations in WI
Idealized ModelIdealized Model
This model accounts for regressive covariation and cost
For some speech knowledge qua behavior in locale , ℓ
Kℓ = βK1 Sℓ + βK2 Gℓ + εK
K is a proxy for knowledge output (acoustic measures)
Sℓ = social factors
Gℓ = geographic factors
SocietySociety Locality, identity & historical continuity
by community: geographic and social barriers Gender, ethnicity, age, immigration,
topography
Sℓ = βS1 Fℓ + βS2 Eℓ + βS3 log( Lℓ ) + βS4 log (Wℓ ) + εK
F = % of population, foreign born in 1900
E = % of population, black in 1900
L = value of livestock in 1900
W = total manufacturing wages in 1900
GeographyGeography
Gravity & rapid decay: attraction by population centers within proximate range based on population
The features for the more geographic features can be stated similarly, as
Gℓ = βK1 log( Pℓ ) + βK2 log (Tℓ ) + βK3 Bℓ + εK
P = log (county population per sq. mi.)
T = log (time to Milwaukee per time to Minneapolis)
B = index of public or private transportation costs to MKE
Geographic MeasuresGeographic Measures
Designed to capture gravity and decay
Population density 1900 population / sq mi in county
Measure of time of transportation log(distance to Mke/distance to Minn) ℓ Negative value is beneficial
Measure of manner of transportation Number of ‘jumps’ in transportation type, and cost of
transportation (0-3) Private is more costly than public Train is more costly than bus
Measure of Measure of Transportation Transportation
Distance Distance
So, what’s KSo, what’s Kℓℓ??
Ceteris paribus, presence or absence of regional markers /æ/ class of words
Kℓ = βK1 log( VDℓ ) + βK2 log (F1Nℓ ) + βK3 log (F2Nℓ ) + βK4 log (TLℓ ) + βK4 log (Θℓ ) + εK
Speaker variables birth year gender
Vowel MeasuresVowel Measures
Recordings of “Arthur the Rat” Extracted from WELS/DARE recordings Aligned TextGrid for Praat from Penn Aligner Corrected edges of /ae/ and neighboring
segments
Post processing selection Pre-obstruent V > 40 msec in the front of the
vowel space
/æ/ measures from Praat: VD=vowel duration, F1N=nucleus height, F2N=nucleus backness, TL=trajectory length, Θ=trajectory angle
ResultsResults
PreliminariesPreliminaries
Problem 1: acoustic similarity and grouping speakers with respect to birth year and gender
Problem 2: Covariance matrix for Geography
Problem 3: Covariance matrix for Society
K = Acoustic K = Acoustic similaritysimilarity
Cluster analysis on individual characteristics
First threw out a speaker because outlier on vowel height New N = 19, but from one of the communities
with two speakers
Clusters — but driven by birth year and gender 1. males of all ages 2. females born before 1900 3. females born after 1900
PreliminariesPreliminaries
Problem 1: acoustic similarity and grouping speakers with respect to birth year and gender
Problem 2: Covariance matrix for Geography
Problem 3: Covariance matrix for Society
Geographic measuresGeographic measures
Recall: two gradient measures Travel time differential to Milwaukee Population density
Linear covariation near significant R2 = 0.15, p=0.056
One potential outlier; would make significant
Selected transportation time Transportation captures density
PreliminariesPreliminaries
Problem 1: acoustic similarity and grouping speakers with respect to birth year and gender
Problem 2: Covariance matrix for Geography
Problem 3: Covariance matrix for Society
Society measuresSociety measures
Recall 4 measures Urban class, rural class, ethnicity,
immigration
Covary? Rural class with urban class (R2 = 0.19,
p<0.05) Rural covaries with transportation time (R2 =
0.39, p<0.05); urban doesn’t Immigrants with rural class (R2 = 0.48,
p<0.05) and urban class (R2 = 0.41, p<0.05) Ethnicity does not covary urban class or
transportation
Revised (realistic) Revised (realistic) modelmodel
Dep var: Indiv acoustic measures
Ind vars: urban class + ethnicity + transportation time
Weight by speaker class (birth year, gender)
Not significantNot significant
Vowel backness
Vowel height
Angle of trajectory
Significant Significant relation 1relation 1
Duration x urban social class
Significant Significant relation 2relation 2
Trajectory length x transportation time
Whence straight-Whence straight-line distances?line distances?
Longitude is significant for vowel trajectory and almost for duration
Neither latitude nor longitude is significant for the other three measures
Interpretation Bias toward westward settlement patterns For eastward moving CAUGHT~COT expect
inverse relation
ConclusionsConclusions
SummarySummary
Clarification of the broad sociolinguistic category of “geography Parametric power: encodes distance and population
Reduces complex matrix of Chambers & Trudgill Broadly reconceputalizes the notion of
“geography”
Lx measures = urban class + ethnicity + transportation time Weighted by age, gender
Keeps the focus local
Geographic Geographic influence on influence on
language variation?language variation? Testing to see if georeferenced data is better
than straightline distance Knew this going in, but need to demonstrate
this because current studies continue to ignore this
Some features do fall out by longitude (duration, trajectory length); how many other studies are due to source of change being at the statistical corner of the analysis space?
Transportation time should overcome this problem because it doesn’t matter which direction one comes from.
Future workFuture work
Convert county data to more local data (April 2, 2012??) Will permit more robust GIS computation
Better treatment of biases Ethnicity Immigration Geography
Build continuity with new data collections
Thanks!Thanks!
UW Graduate SchoolUW Graduate SchoolThe Dictionary of American The Dictionary of American
Regional EnglishRegional EnglishWisconsin Englishes ProjectWisconsin Englishes Project
(Luke Annear, Trini Stickle, Nick (Luke Annear, Trini Stickle, Nick Williams)Williams)