a method for automatically constructing case frames for english

12
1 A Method for Automatically Constructing Case Frames for English Daisuke Kawahara and Kiyotaka Uchimoto (LREC2008, 2008/05/29 al Institute of Information and Communications Tech

Upload: griffin-watson

Post on 31-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

A Method for Automatically Constructing Case Frames for English. Daisuke Kawahara and Kiyotaka Uchimoto. National Institute of Information and Communications Technology. (LREC2008, 2008/05/29). Background. NLP analyzers so far (Mainly) supervised, (relatively) knowledge-poor - PowerPoint PPT Presentation

TRANSCRIPT

1

A Method for Automatically Constructing Case Frames for

English

Daisuke Kawahara and Kiyotaka Uchimoto

(LREC2008, 2008/05/29)

National Institute of Information and Communications Technology

2

Background• NLP analyzers so far

– (Mainly) supervised, (relatively) knowledge-poor

• e.g., PP-attachment or parsingMary ate the salad with a fork

Mary ate the salad with mushrooms

– Only 1.5% of bilexical dependency was learned [Bikel, 04]

Toward knowledge-oriented NLP– Automatically compile case frames and integrate

them into NLP analyzers/applications

3

Related work

• Subcategorization frames– [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe

and Carroll, 97] [Korhonen, 02] …

e.g., She greeted me.• NP(sbj) greet NP(obj)

e.g., She gave him a book.• NP(sbj) give NP(obj) NP(obj)

# of SCFs # of verbs corpus size Acc

[Brent, 1993] 6 63 1.2M 85%[Ushioda et al., 1993] 6 33 0.3M 86%[Manning, 1993] 19 200 4.1M 82%[Ersan & Charniak, 1996] 16 30 36M 70%[Caroll & Rooth, 1998] 15 100 30M 77%[Briscoe & Caroll, 1997] 161 7 1.2M 81%[Sarkar & Zeman, 2000] 137 914 0.3M 88%

4

Related work

• Subcategorization frames– [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe

and Carroll, 97] [Korhonen, 02] …

• (Handmade) frames– FrameNet [Baker et al., 98],

PropBank [Palmer et al., 05]

• Japanese case frames– Semantics-based: [Haruno, 95] [Utsuro et al., 96]– Example-based: [Kawahara and Kurohashi, 06]

5

CS examples (in English)

yaku (1)(bake)

ga I:18, person:15, craftsman:10, …

wo bread:2484, meat:1521, cake:1283, …

de oven:1630, frying pan:1311, …

yaku (2)(have difficulty)

ga teacher:3, government:3, person:3, …

wo hand:2950

ni attack:18, action:15, son:15, …

yaku (3)(burn)

ga company:1, distributor:1, …

wo data:178, file:107, copy:9, …

ni R:1583, CD:664, CDR:3, …

ga: nominative, wo: accusative, ni: dative, de: instrument

Construction of case frames for Japanese [Kawahara and Kurohashi, LREC2006]

6

Case frames for 10K predicates

Construction of case frames for English

100M sentences(English Gigaword)

Filtering andParsing

Predicate-argumentstructures

Clustering WordNet

MSTParser47M sents.

sbj:you pred:borrow obj:idea pp:from:artist

sbj:she pred:borrow obj:idea pp:over:year

sbj:i pred:borrow obj:dollar pp:from:friend

sbj:farmer pred:borrow obj:money pp:for:supply

sbj:he pred:borrow obj:money pp:from:companysbj:{you,she} pred:borrow obj:idea pp:from:artist pp:over:year

sbj:i pred:borrow obj:dollar pp:from:friend

sbj:{farmer,he} pred:borrow obj:money pp:for:supply pp:from:company

sbj:{you,she} pred:borrow obj:idea pp:from:artist pp:over:year

sbj:{farmer,he} pred:borrow obj:{money,dollar} pp:for:supply pp:from:{company,friend}

7

Specification of our case frames

• Case slots– surface cases (dependency labels) and

prepositions• sbj, obj, obj2, pp:for, pp:in, …

• Instances– words– several semantic markers

• <time>, <num>, <clause>

8

Details of case frame construction

• Use only reliable parses– Sentence length <= 20 words– MSTParser [McDonald et al., 06]

• Extract predicate-argument structures– From labeled dependency parses

• Group and cluster p-a structures– Grouping by a dominant case slot

• pre-defined order: obj, sbj, pp:*

– Clustering based on WordNet

•Labeled dependency acc.:89.9% → 91.5%•Complete rate: 36.3% → 56.4%

9

sbj: { i } obj: { dollar } pp:from: { friend }

sbj: { farmer, he } obj: { money }pp:from: { company }5

3

10

81

1 1

0.82

73.053

573.0373.0

82.0111

173.0173.010.1

0.73 1.0

ratio of common cases:381

381

5103)11(

510)11(

82.0108

1082.0882.0

similarity betweeninstances (words): 53108111

73.05382.010882.0111

0.73

CF1

CF2pp:for:supply

Clustering of case frames

similarity between case frames

3

10

Results

• Obtained case frames for 9,300 verbs

• Evaluated case frames of 20 verbs– Criteria:

• Verb usage is disambiguated by dominant arguments

• Case frames must have obligatory case slots• Case slots, except a dominant one, may

contain an ineligible example

– Accuracy: 88.4%

11

Examples of obtained case frames

CS examples

burn (1) sbj they:262, it:113, protester:99, …

obj flag:247, effigy:81, house:67, …

pp:in <num>:29, ramallah:14, brisbane:11, …

pp:for week:15, hour:6, month:5, …

burn (2) sbj candle:26, lamp:5

pp:on motor-scooter:7, altar:3, platform:1, …

pp:for day:2, steinhaeuser:1

12

Conclusion and future work

• Constructed broad-coverage case frames for English– Described real use of English verbs

• Future work– Use more sophisticated methods for extracting

reliable parses [Kawahara and Uchimoto, 08]

– Integrate case frames to parsing (and other applications)

• cf. [Zeman, 02] for subcategorization frames[Kawahara and Kurohashi, 06] for case

frames