19.5.2010 lrec malta

30
www.uni-stuttgart.de [A Recursive Annotation Scheme [for Referential Information Status] ] Arndt Riester 1 , David Lorenz 2 , Nina Seemann 1 1 Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart 2 English Department, University of Freiburg 19.5.2010 LREC Malta

Upload: elaine

Post on 23-Feb-2016

65 views

Category:

Documents


0 download

DESCRIPTION

[A Recursive Annotation Scheme [ for Referential Information Status] ] Arndt Riester 1 , David Lorenz 2 , Nina Seemann 1 1 Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart 2 English Department, University of Freiburg. 19.5.2010 LREC Malta. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

[A Recursive Annotation Scheme [for Referential Information Status] ]

Arndt Riester1, David Lorenz2, Nina Seemann1

1Institute for Natural Language Processing (IMS) & SFB 732,University of Stuttgart

2English Department,University of Freiburg

19.5.2010LREC Malta

Page 2: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

2

Information Status

Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items

Page 3: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

3

Information Status

Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,

1994)

Page 4: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

4

Information Status

Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,

1994) or between EVOKED, INFERRABLE and NEW items (Prince

1981)

Page 5: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

5

Information Status

Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,

1994) or between EVOKED, INFERRABLE and NEW items (Prince

1981) or: e.g. Prince (1992), Nissim et al. (2004), Dipper et al. (2007)

BRAND-NEW ANCHORED

DISCOURSE OLD

OLD-RELATIVE

HEARER NEW

OLD-IDENTITYUNUSED

CONTAINING INFERRABLE

BRAND-NEW UNANCHORED

BRIDGING

DISC

OUR

SE N

EWTEXTUALLY EVOKED

MEDIATED-SITUATION

OLD-GENERIC

MEDIATED-PART

OLD-ID-GENERIC

OLD-GENERIC

OLD-GENERAL

DISCOURSE OLD

OLD-EVENTMEDIATED-GENERAL MEDIATED-AGGREGATED

MEDIATED-FUNC_VALUES

MEDIATED-POSSESSIVE

MEDIATED-EVENT

ACCESSIBLE-INFERABLE

ACCESSIBLE-SITUATION

ACCESSIBLE-GENERAL

SITUATIONALLY EVOKED

Page 6: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

6

Desiderata

A simple scheme based on clear theoretical assumptions Good inter-coder agreement for different textual genres Full coverage of all nominal expressions Capable of dealing with recursive embeddings

(1) [the red gem [in [the Queen‘s] crown] ]

3 referents

Page 7: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

7

Desiderata

A simple scheme based on clear theoretical assumptions Good inter-coder agreement for different textual genres Full coverage of all nominal expressions Capable of dealing with recursive embeddings

(1) [the red gem [in [the Queen‘s] A crown] B ] C

3 referents

3 nested labels for information status

Page 8: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

8

Two levels of givenness Givenness of words: repetition, synonymy, hypernymy(2) {On my way home, I saw a poodle.

a. It reminded me of Anna‘s poodle.b. It reminded me of Anna‘s dog.

Givenness of referents: coreference(3) {On my way home, I saw a poodle.}

a. The poodle / It tried to bite me.b. The stupid beast tried to bite me.

Page 9: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

9

Two levels of givenness Givenness of words: repetition, synonymy, hypernymy(2) {On my way home, I saw a poodle.

a. It reminded me of Anna‘s poodle.b. It reminded me of Anna‘s dog.

Givenness of referents: coreference(3) {On my way home, I saw a poodle.}

a. The poodle / It tried to bite me.b. The stupid beast tried to bite me.

Keep the two apart! In the following: GIVEN ≡ coreferential But see Baumann & Riester (2010) for a two-level scheme

( Importance for prosody)

Page 10: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

10

Context Theory

discourse context(e.g. DRT; Kamp &

Reyle 1993): what has been explicitly stated

before

utterance context (indexicality; e.g.

Kaplan 1989): speaker, location, time; entities in visual environment

frame contexts(e.g. Fillmore 1985):

plausible protagonists in a scenario

encyclopaedic context (e.g. Kamp, to appear): world

knowledge of an expected audience

Page 11: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

11

A Simple Rule for Definite Expressions

Definite descriptions, demonstratives, proper names, pronouns trigger the presupposition that their referent should be identified in „the“ context (e.g. Heim, 1983; van der Sandt, 1992).

Claim: Information status classes should directly reflect the four context components.

Page 12: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

12

A Simple Rule for Definite Expressions

Definite descriptions, demonstratives, proper names, pronouns trigger the presupposition that their referent should be identified in „the“ context (e.g. Heim, 1983; van der Sandt, 1992).

Claim: Information status classes should directly reflect the four context components.

Definite identified in Information status class

discourse context GIVEN

utterance context SITUATIVE

frame context BRIDGING

encyclopaedic context UNUSED

Page 13: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

13

Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/

writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)

No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations

whether a (non-anaphoric) item is known to an intended audience

Page 14: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

14

Will they know this?

YES

UNUSED-KNOWN

NO

UNUSED-UNKNOW

N

Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/

writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)

No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations

whether a (non-anaphoric) item is known to an intended audience

„Barack Obama“ „the woman Max went out with last night“

Page 15: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

15

Will they know this?

YES

UNUSED-KNOWN

NO

UNUSED-UNKNOW

N

Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/

writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)

No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations

whether a (non-GIVEN) item is known to an intended audience

„Barack Obama“ „the woman Max went out with last night“

accommodationencyclopaedic

knowledge

Page 16: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

16

News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW

has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.

Page 17: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

17

News Example (USA Today, 17.5.10) [...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW

has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.

Page 18: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

18

News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW

has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.

Page 19: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

19

News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW

has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.

Page 20: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

20

News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED,

but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET

highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [thenation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.

Page 21: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

21

Data

Transcripts from German radio news bulletins (three full days of (hourly) news)

About 3000 sentences Parsed with XLE / German LFG grammar (Rohrer & Forst 2006) Annotated with SALTO tool (Burchardt et al. 2006), extended

TigerXML format Two annotators, verification and ultimate decision by a third

annotator

Page 22: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

22

Annotation using SALTO (Burchardt et al. 2006)

„...said Kirchner in Cordoba...“ „... the Argentinian head of state...“

Page 23: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

23

Inter-Annotator Agreement (Cohen 1960)

Evaluation performed on a subset comprising 1149 nominal expressions, which the annotators had to identify by themselves

1100 expressions identified by both annotators 757 labeled identically Agreement κ = .66 (full scheme: 21 subclasses)

κ = .78 (core scheme comprising 6 classes: GIVEN, SITUATIVE, BRIDGING, UNUSED, INDEF, OTHER)

Comparison: Dipper et al. (2007), κ = .55 (newspaper commentaries) Nissim et al. (2004), κ = .79 (full); κ = .85 (core) (dialogue)

(fewer embeddings; pre-exclusion of „difficult“ cases)

(Source: Ritz et al. 2008)

Page 24: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

24

Conclusion

Scheme enables fast, comprehensible and reliable annotations of nested expressions in arbitrary text genres

Useful fora. Computational linguists: e.g. creating a gold standard for anaphora

resolution and related tasksb. Theoretical linguists: empirical data for investigations into form of

referring expressions, (non-)restrictivity of modification, word order, grammatical role, discourse structure etc.

c. Phoneticians: investigating prosody in spoken corpora

Learn more: http://www.ims.uni-stuttgart.de/~arndt

Page 25: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

25

Thank you!

Page 26: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

26

Details: GIVEN

Subclasses: PRONOUN, REFLEXIVE, SHORT, REPEATED, EPITHET

(1) Both had the blessings of Dr. Richard Klausner. But even [Klausner]GIVEN-SHORT had to be persuaded at first.

(2) Before the European Union‘s ban on incandescent lightbulbs went into effect on Sept. 1, consumers across Europe raided stores to stockpile [the familiar bulbs]GIVEN-EPITHET

Page 27: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

27

Details: BRIDGING

Subclasses: 0, TEXT, CONTAINED

(1) Germany lost the football match against England because [the audience]BRIDGING was against them.

(2) United were trailing 3-1 when Fletcher was felled [in the area]BRIDGING-TEXT by Aleksei Berezutski. The Scotland Midfielder midfielder was then yellow-carded by [the referee]BRIDGING-TEXT.

Page 28: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

28

Details: bridging-contained vs. unused-unknown

(1) The Republicans won [the governorship of Virginia]BRIDGING-

CONTAINED.

(expected / prototypical relationship)

(2) He was convicted of helping to organise [the seizure [of Osama Moustafa Nasr]]UNUSED-UNKNOWN from a Milan street in February 2003.(non-prototypical relationship, can‘t be separated)

(3) # Speaking of Osama Moustafa Nasr, [the seizure] happened in 2003.

Page 29: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

29

Details: INDEF

Subclasses: NEW, GENERIC, PARTITIVE, RESUMPTIVE

(1) [A man]INDEF-NEW came in. He bought a pair of shoes.(2) [Serious beer drinkers]INDEF-GENERIC should head straight to this

550-year old institution.(3) At violent clashes between the police and demonstrating Kurds,

[three demonstrators]INDEF-PARTITIVE were injured.(4) That‘s close to how a cancer vaccine works, but not precisely.

Most experts see [cancer vaccines]INDEF-RESUMPTIVE as a hybrid of treatment and prevention.

Page 30: 19.5.2010 LREC Malta

ww

w.u

ni-s

tuttg

art.d

e

30

Other

EXPLETIVE NULL: nobody, nothing RELATIVE: non-restrictive relative clause CATAPHOR: can be indefinite or definite