capturing patterns of linguistic interaction in a parsed corpus

Capturing patterns of Capturing patterns of linguistic interaction in a linguistic interaction in a

parsed corpusparsed corpusA methodological case study

Sean WallisSurvey of English Usage

University College [email protected]

Capturing linguistic Capturing linguistic interaction...interaction...• Parsed corpus linguistics• Intra-structural priming• Experiments

– Attributive AJPs before a noun– Embedded postmodifying clauses– Sequential postmodifying clauses– Speech vs. writing

• Conclusions• The handout explains the analytical method in more detail

(so read it later!)

Parsed corpus linguisticsParsed corpus linguistics• An example tree from ICE-GB (spoken)

S1A-006 #23

Parsed corpus linguisticsParsed corpus linguistics• Three kinds of evidence may be obtained

from a parsed corpusFrequency evidence of a particular known

rule, structure or linguistic eventCoverage evidence of new rules, etc.Interaction evidence of the relationship

between rules, structures and events• This evidence is necessarily framed within

a particular grammatical scheme– How might we evaluate this grammar?

Intra-structural primingIntra-structural priming• Priming effects within a structure

– Study repeating an additive step in structures

• Consider– a phrase or clause that may (in principle)

be extended ad infinitum• e.g. an NP with a noun head

N


– Study repeating an additive step in structures

• Consider– a phrase or clause that may (in principle)

be extended ad infinitum• e.g. an NP with a noun head

– a single additive step applied to this structure

• e.g. add an attributive AJP before the head

N

AJP


– Study repeating an additive step in structures• Consider

– a phrase or clause that may (in principle) be extended ad infinitum

• e.g. an NP with a noun head– a single additive step applied to this structure

• e.g. add an attributive AJP before the head– Q. What is the effect of repeatedly applying

this operation to the structure?

shipN

N

AJP







shipNAJP

tall

N

AJP







shipNAJP

very greentallAJP

N

AJP







shipNAJP

very greentallAJP

N

AJP

AJPold

Experiment 1: analysis of Experiment 1: analysis of resultsresults• Sequential probability analysis

– calculate probability of adding each AJP– error bars: Wilson intervals– probability falls

• second < first• third < second

– decisions interact

– Every AJP addedmakes it harderto add another

0.00

0.05

0.10

0.15

0.20

0 1 2 3 4 5

probability

Experiment 1: explanations?Experiment 1: explanations?• Feedback loop: for each successive AJP,

it is more difficult to add a further AJP logical-semantic constraints

• tend to say the tall green ship • do not tend to say tall short ship or green tall ship

communicative economy• once speaker said tall green ship, tends to only say ship

memory/processing constraints• unlikely: this is a small structure, as are AJPs

Experiment 1: speech vs. Experiment 1: speech vs. writingwriting• Spoken vs. written subcorpora

– Same overall pattern– Spoken data tends to have fewer attributive AJPs

• Support for communicative economy or memory/processing hypotheses?

– Significance tests• Paired 2x1 Wilson tests

(Wallis 2011)• first and second

observed spoken probabilities are significantly smallerthan written

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5

probability

written

spoken

Experiment 2: preverbal AVPsExperiment 2: preverbal AVPs• Consider adverb phrases before a verb

– Results very different• Probability does not fall significantly between first and

second AVP• Probability does fall

between third and second AVP

– Possible constraints• (weak) communicative • (weak) semantic

– Further investigationneeded 0.00

0.05

0.10

0 1 2 3 4

probability

Experiment 3: postmodifying Experiment 3: postmodifying clausesclauses• Another way to specify nouns in English

– add clause after noun to explicate it• the ship [that was in the port]• the ship [called Ariadne]

– may be embedded• the ship [that was in the port [we visited last week]]

– or successively postmodified• the ship [called Ariadne][that was in the port]

Experiment 3: (i) Experiment 3: (i) embeddingembedding• Probability of adding a further embedded

postmodifying clause falls with size– All data

• second < first• third < first

– Spoken• second < first

– Written• third < second

• Compare with effect ofsequential postmodification of same head

0.00

0.05

0.10

0 1 2 3 4

probability

written

spoken

all

Experiment 3: (ii) Experiment 3: (ii) sequentialsequential• Probability of sequential postmodifying

falls - and - for spoken data, falls, then rises– All data

• second < first– Spoken

• third > second

0.00

0.05

0.10

0.15

0 1 2 3 4 5

probability

written

spoken

Experiment 3: (ii) Experiment 3: (ii) sequentialsequential• Probability of sequential postmodifying falls -

and - for spoken data, falls, then rises– All data

• second < first– Spoken

• third > second– Option: count

conjoins separatelyor treat as single item

• Either way, results showsimilar pattern

– Negative feedback: the ‘in for a penny’ effect0.00

0.05

0.10

0.15

0 1 2 3 4 5

probability

written

spoken

Experiment 3: (iii) Experiment 3: (iii) embedembed vs. vs. seqseq• Embedded vs. sequential postmodification

• embedding > sequence (second level)– It is slightly easier to

modify the latest headthan a more remoteone:

• semantic constraints?• backtracking cost?

– Third level• embedding < sequence

(if counting conjoins)• long sequences seem to be easier to construct than

comparable layers of embedding0.00

0.05

0.10

0.15

0 1 2 3 4 5

probability

embedding

sequential

ConclusionsConclusions• A method for evaluating interactions along

grammatical axes– General purpose, robust, structural– More abstract than ‘linguistic choice’ experiments– Depends on a concept of grammatical distance

along an axis, based on the chosen grammar• Method has philosophical implications

– Grammar viewed as outcome of linguistic choices– Linguistics as an evaluable observational science

• Signature (trace) of language production decisions– A unification of theoretical and corpus linguistics?

Potential applicationsPotential applications• Corpus linguistics

– Optimising existing grammatical framework• e.g. coordination, compound nouns

– Comparing genres/languages/periods• Theoretical linguistics

– Comparing different grammars, same language• Psycholinguistics

– Search for evidence of language production constraints in spontaneous speech corpora

• speech and language therapy• language acquisition and development

ReferencesReferencesNelson, G., Wallis, S. & Aarts, B. (2002) Exploring natural

language. Benjamins.Pickering, M. & Ferreira, V. (2008) Structural priming.

Psychological Bulletin 134, 427–459.Wallis, S.A. (2011) Comparing χ² tests for separability.

Survey of English Usage.

• For explanation of the analysis method see the handout!• For more detail and a draft of the full paper see

http://corplingstats.wordpress.com

capturing patterns of linguistic interaction in a parsed corpus

Documents

structure study

structuresconsidera

extended ad infinitume

interaction evidence

eventsthis evidence

particular known rule

ajperror bars

writingconclusionsthe