1 syntagmatic preferences patrick hanks masaryk university in honour of yorick wilks bcs, london,...

22
1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

Upload: tyrone-chandler

Post on 25-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

1

Syntagmatic Preferences

Patrick HanksMasaryk University

In honour of Yorick Wilks

BCS, London, June 22, 2007

Page 2: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

2

What's so important about “My car drinks gasoline”?

• Violation of “selection restrictions” is normal.

• So selectional restrictions aren't restrictions at all

– They are, in fact selectional preferences

– Different combinations of selectional preferences activate different senses

• Yorick's insights of the 1970s deserve to be followed up more vigorously and systematically than they have been.

Page 3: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

3

A language is a double helix

• Start from the bottom up:– Let’s look at what the words do.

– How do people use words to make meanings?

• A natural language is a system of norms and exploitations:– Norms: Animals drink water, people drink beverages

– Exploitations: My car drinks gasoline

• Syntagmatic rules governing normal linguistic behaviour systematically interact with exploitation rules governing how those norms are exploited

Page 4: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

Patterns of linguistic behaviour• Normal linguistic behaviour is highly patterned.• Words in isolation have meaning potential, not meaning

– A meaning potential is a more or less vague cluster of possibilities – e.g. what does fire mean?

– A burning process? (and if so is it a good thing – in a house, under control – or a bad thing, raging out of control in a forest?) An electric heater? A sense of enthusiasm? Dismiss from employment? Operate a gun? Shoot an arrow? Cause to enthuse? Bake?

– All of these and more. – Much overlap.– Sense enumeration doesn’t get it (cf Pustejovsky’s lexical conceptual paradigms)

• In context, the range of possible interpretations of a word is severely limited:– People firing guns, ideas that fire people with enthusiasm, employers firing their

staff, firing pottery in a kiln

Page 5: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

Word Use, Meaning, and Linguistic Theory

• The normal uses of a word can be grouped into patterns, and meanings can be associated with the patterns (rather than the word in isolation)

• So far they haven’t been. Why not?– Lack of evidence

• Lexical analysis can only be done effectively with large corpora– Tradition and intuition

• direttissimo assaults on word meaning• No one thought to go the long way round, via patterns

– The tyranny of “all and only”• Lexicographers aimed to cover all possible uses, not just all normal uses• NLP and linguistic theory focused on boundary cases

– Syntactocentrism in linguistic theory• misses the point about syntagmatics

– Lack of a suitable theory• Aha! Preference Semantics provides the basis for such a theory• We should take PS seriously and ally it with other relevant theoretical work

(Wittgenstein, Putnam, Rosch, Sinclair, Hoey, Pustejovsky, …)

Page 6: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

6

Why is a Pattern Dictionary Necessary?

• Standard dictionaries do not provide the contexts that distinguish one sense of a word from another.– very poor syntagmatic information– give equal prominence to normal and merely

possible senses– definitions (and senses) are not mutually exclusive

• WordNet: synsets ≠ word senses!• FrameNet: frames ≠ word senses!

Page 7: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

7

Identifying norms is hard

• ... and boring– The painful rediscovery of the obvious,– which is only obvious when pointed out

• Only by painstaking corpus analysis is identifying norms possible.

• What counts as a normal use of any verb? – e.g. drink

Page 8: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

8

Norms for 'drink', v.

1. 55% [[Human]] drink [[{Liquid = Water} | Beverage]]

2. 4% [[Animal]] drink [[Liquid = Water]]

3. 39% [[Human]] drink [NO OBJ]

4. 1% [[Human]] drink [[Experience]] {in}

5. 1% [[Human]] drink ([[Liquid = Beverage]]) {up}

Page 9: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

9

Some Exploitations of 'drink'

A metaphor (or literary allusion):

• The child of a nonconformist father learnt to drink deep of the Catholic tradition .

– Owen Chadwick, 1991. Michael Ramsey: a life.

A coercion:

• ` He knows them all , ' she says adoringly , ` and they all drink shampoo -- nearly every night .

– The Guardian, 1989.

Page 10: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

10

How pervasive is ambiguity?

• Not as pervasive as you might think.

– If we attach meanings to patterns, not to words, most “ambiguities” don't get a chance to rear their ugly heads.

• But here's one: He drank. • Could be a null-object alternation of “he drank [[Beverage]]”

• or it could mean that he had a problem with alcohol (pattern 2)

Page 11: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

11

Getting the right level of generalization is hard

“John fired at a line of stags”

• Corpus evidence shows that fire at does not prefer ANIM in the prepositional object slot. Any PHYSOBJ will do.

• Building a pattern dictionary is a constant struggle to get “the right level” (or at least an acceptable level ) of generalization

• Art is required to choose a level.

• There are no right answers (no absolutes). – But plenty of wrong ones!

Page 12: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

12

Semantic Types and Semantic Roles

• fire at assigns the semantic role “Target” to words of semantic type [[Physical Object]]

• Semantic types are the intrinsic prototypical values of nouns – their essences

• Semantic roles are assigned by context

Page 13: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

13

Word Meaning: a complex linguistic Gestalt

• In the mind of an English speaker, the verb land is primed for any or all of the following: – passengers land from a plane – the pilot lands the plane – the plane

lands – we landed at Heathrow – passengers land from a boat (but more probably they are soldiers) – a commander lands his troops (but not from a plane) – a boat lands its cargo – a trawler lands its catch – an angler lands a fish – Yorick landed the role of Caliban – He landed a job in Sheffield – someone else may land in trouble – or be landed with a problem – and someone may even land a blow on your nose

Page 14: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

14

Imposing order on chaos

In the Pattern Dictionary:• Verbs are sorted into patterns

• Exploitations are flagged for later analysis

• Nouns (“lexical sets”) are clustered into an ontology

• The ontology is “distorted” by usage

• Lexical sets “shimmer”

Page 15: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

15

Lexical Sets “shimmer”

• [[Human]] attend [[Event]]– Lexical set [[Event]] = { meeting, conference, funeral, ceremony,

course, school, seminar, lecture, session, class, rally, dinner, hearing, briefing, reception, workshop, wedding, inquest, summit, concert, event, premiere, …}

• [[Human]] participate {in [[Event]]}– Lexical set [[Event]] = {debate, election, exercise, coup,

demonstration, activity, process, conference, consultation,

selection, meeting, …} • [[Human]] hail [[Event]]

– Lexical set [[Event]] = {victory, success, agreement, vote, opening, development, result, start, resurgence, …}

Page 16: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

16

Patterns are contrastive

• 2% [[Human]] launch [[Boat]]

• 7% [[Human]] launch [[Projectile]]

• 58% [[Human | Institution]] launch [[Activity | Plan]]

• 24% [[Institution]] launch [[{Artifact = Product} | {Activity = Service}]]

Page 17: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

17

What is a Pattern Dictionary?

• a inventory of all normal patterns of verb use– not all possible uses.

• an ontology of “shimmering” lexical sets (clusters of nouns according to semantic type and argument roles)

• an inventory of semantically motivated syntagmatic distinctions

Page 18: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

18

Tools needed to build a Pattern Dictionary

• A balanced corpus of the language (i.e. general language)• A theory

– An initial lexical architecture that guides clusteringWilks, Pustejovsky, Sinclair, …– A lexical model that distinguishes norms from exploitations

• A methodology: Corpus Pattern Analysis– Hanks 2004, Hanks and Pustejovsky 2005– Including statistical corpus analysis

• Church and Hanks 1989, Kilgarriff et al. 2004, 2005• A shallow ontology

– A hierarchical organization of semantic types, reflecting word groupings, not scientific conceptualization of the universe

• A suite of corpus tools: Manatee, Bonito, Word Sketch Engine• Kilgarriff, Rychlý

Page 19: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

19

CPA procedure

• Create a sample concordance (KWIC index) for a word: – 250 examples of actual uses of the word

• Identify the typical syntagmatic patterns. • Assign each line of the sample to one of the

patterns.• Take further samples if necessary.

– Introspection is used to interpret data, but not to create data.

• Store the pattern in the entry manager.

Page 20: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

20

In CPA, every line in the sample must be classified

The choices are:

• Norms

• Exploitations

• Alternations

• Names (Midnight Storm: name of a horse, not a storm)

• Mentions (to mention a word or phrase is not to use it)

• Errors (e.g. learned mistyped as leaned)

• Unassignables– See Proceedings of the Eleventh EURALEX International

Congress, pages 105–116, Lorient, France, 2004.

Page 21: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

21

How normal are norms? How frequent are exploitations?

• Roughly 75% of all clauses activate “primary norms”

• About 20% activate secondary norms

– including conventional metaphors

– and some expressions that may once have been exploitations themselves

• About 4% of all clauses involve exploitations of various sorts

– dynamic metaphors, other tropes, coercions, ellipsis, etc.

• About 1% of all clauses are unclassifiable

Page 22: 1 Syntagmatic Preferences Patrick Hanks Masaryk University In honour of Yorick Wilks BCS, London, June 22, 2007

22

Browsing and Feedback

• The English Pattern Dictionary• http://nlp.fi.muni.cz/projects/cpa/• Browse the first 50 verbs at https://apollo.fi.muni.cz:8007/

– Login and password are both “guest” – Click on the pattern number to see the whole pattern– Click on “lines” to see supporting corpus evidence

• 50 verb entries have been completed and released– Feedback, please!

• 400 additional entries have been analysed, awaiting release– A shallow ontology has been drafted and is being edited– But not populated with nouns yet– 6500 verbs remain to be analysed

• EPD will not include rare words like saltate or saccharify