software and tools for corpus pattern analysis · naacl 2015 three tasks cpa parsing cpa clustering...

18
SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS Vít Baisa, Ismaïl El Maarouf, Adam Rambousek, Pavel Rychlý

Upload: others

Post on 11-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

SOFTWARE AND TOOLS FOR CORPUSPATTERN ANALYSIS

Vít Baisa, Ismaïl El Maarouf, Adam Rambousek, PavelRychlý

Page 2: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

OUTLINECorpus Pattern AnalysisAnnotation in Sketch EngineCPA editorPublic accessSemEval 2015LEMON API

Page 3: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

INTRODUCTIONtools and datasetsto support Pattern Dictionary of English Verbs (PDEV)since 2006DVC project 2012–2015

Page 4: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

CPAassociating word meaning with word use by ananalysis of phraseological patterns and collocationsmeaning is associated with prototypical sentencecontextsconcordance lines are grouped into semanticallymotivated syntagmatic patternshard problem: granularitySubj, Obj, Complement, Adverbial, Indirect ObjBritish National Corpus (written part)

Page 5: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

CPA IIdeterminers: take place vs. take his placesemantic types: build [[Machine]] vs. build[[Relationship]]contextual roles: [[Human = Director]] shootvs. [[Human = Sports Player]] shootlexical sets: reap {the whirlwind} vs. reap{the harvest}

Page 6: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

ANNOTATION IN SKETCH ENGINEnot so well-known feature of SkEno paper published (until now)not documented :)

c o m m i t 7 b 6 8 2 e 7 4 7 3 d 9 3 5 b 1 4 f 4 8 b 7 b 5 3 5 2 8 3 8 3 1 8 b f d a 5 2 3 A u t h o r : p a r y D a t e : S a t S e p 2 2 2 : 3 4 : 1 0 2 0 0 6 + 0 0 0 0

[ b o n i t o 2 @ 2 0 0 6 - 0 9 - 0 2 2 2 : 3 4 : 1 0 b y p a r y ] a d d e d l i n e g r o u p / a n n o t c o n c

Page 7: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official
Page 8: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

ANNOTATION IN SKETCH ENGINE IIfeatures for lexicographersannotation with word sketchesbootstrapping of partial annotationautomatic patternstraining modemulti-line labellingcustom labels

Page 9: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

ANNOTATION IN SKETCH ENGINE IIIsynchronization with CPA editorbasic statistics

Page 10: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

CPA EDITORJavaScript (jQuery), standaloneconnected to SkE and DEB servercreating and managing PDEV entries, pattern,ontologycode used by Ken Litkowski for PDEP

Page 11: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official
Page 12: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official
Page 13: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official
Page 14: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official
Page 15: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

CPA PUBLIC ACCESSsimplified interfacelive data, complete verbs

Page 16: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

SEMEVAL DATASETNAACL 2015three tasks

CPA parsingCPA clusteringCPA pattern editing

Microcheck, Wingspreadauto-cpa user

Page 17: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

LEMON APIan official release of PDEV as linked dataRDF scheme, used by WordNet, DBpedia, ...17,634 triples

Page 18: SOFTWARE AND TOOLS FOR CORPUS PATTERN ANALYSIS · NAACL 2015 three tasks CPA parsing CPA clustering CPA pattern editing Microcheck, Wingspread auto-cpa user. LEMON API an official

CONCLUSION, FUTURE WORK

a reference for future articles

consolidation of code (merge)

other projects are planned (CPA for nouns, adjectives)linking English, Italian, Spanish pattern dictionaries(EURALEX)full CPA bibliography in the proceedings