introduction to jape - gate · university of sheffield nlp limitations of gazetteer lists •...
TRANSCRIPT
Introduction to JAPEIntroduction to JAPEIntroduction to JAPEIntroduction to JAPE
Mark A. Greenwood
University of Sheffield NLP
RecapRecapRecapRecap
• Installed and run GATE• Installed and run GATE• Understand the idea of
� LR – Language Resources� PR – Processing Resources
• ANNIE� Understand the goals of information extraction� Loaded ANNIE into GATE� Constructed one or more gazetteer lists
University of Sheffield NLP
OverviewOverviewOverviewOverview
• Limitations of Gazetteer Lists• High Level Overview of Pattern Matching• High Level Overview of Pattern Matching• What is JAPE?• Learn JAPE by Example
� Input Specifications� Left Hand Side� Macros� Right Hand Side� Right Hand Side� Phases� Loading JAPE into GATE
• Hands On – Extending the IE example
University of Sheffield NLP
Limitations of Gazetteer ListsLimitations of Gazetteer ListsLimitations of Gazetteer ListsLimitations of Gazetteer Lists
• Gazetteer lists are designed for annotating • Gazetteer lists are designed for annotating simple, regular features� Some flexibility is provided by matching
• Word roots
• Whole/part words
• For example, recognising e-mail • For example, recognising e-mail addresses using just a gazetteer would be impossible
University of Sheffield NLP
High Level Overview ofHigh Level Overview ofHigh Level Overview ofHigh Level Overview of
Pattern MatchingPattern MatchingPattern MatchingPattern Matching
• The early components in the ANNIE • The early components in the ANNIE pipeline produce simple annotations� Token, Sentence, Lookup
• These annotations have features� Token kind, part of speech, major type...
• Patterns in these annotations and features can suggest more complex information
University of Sheffield NLP
What is JAPE?What is JAPE?What is JAPE?What is JAPE?
• JAPE provides pattern matching in GATE• JAPE provides pattern matching in GATE• Each JAPE rule consists of the
� LHS which contains patterns to match� RHS which details the annotations (and
optionally features) to be created
• JAPE rules combine to create a phase• JAPE rules combine to create a phase• Phases combine to create a grammar
University of Sheffield NLP
Learn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExamplePhase: EMail
Input: Token SpaceToken Options: control = appeltOptions: control = appelt
Macro: WORD_OR_NUMBER(
({Token.kind == word}|{Token.kind == number}))
Rule: emailaddressPriority: 50
((WORD_OR_NUMBER)+
({Token.string == "."}(WORD_OR_NUMBER)+)*({Token.string == "."}(WORD_OR_NUMBER)+)*{Token.string == "@"}
(WORD_OR_NUMBER)+({Token.string == "."}(WORD_OR_NUMBER)+)*
)
:email -->:email.EMail= {rule = "emailaddress"}
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications
• Each JAPE file defines a phase of the grammar.grammar.
• The header specifies how the rules within the phase will be applied to the documents
• The input to the rules within this phrase is the subset of annotations specified in the the subset of annotations specified in the header
• The rules within a single phase compete based on the control option
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications
• 5 different control styles:� Appelt (use of priorities)� Appelt (use of priorities)
� Once (as soon as a rule fires, matching stops)
� First (shortest rule fires)
� Brill (fire every rule that applies)
� All (all possible matches)
• Appelt priority is applied in the following order� Longest pattern
� Explicit priority (default = -1)
� First defined rule
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Input SpecificationsInput SpecificationsInput SpecificationsInput Specifications
A A A {A}+
Appelt
Once
Brill
First
Brill
All
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns
• LHS is expressed in terms of existing annotations, and optionally features and their valuesand optionally features and their values
• Any annotation to be used must be included in the input header
• Any annotation not included in the input header will be ignored (e.g. whitespace)
• Each annotation is enclosed in curly braces• Annotations may be combined using traditional • Annotations may be combined using traditional
Klene operators: | * + ?• Each pattern to be matched is enclosed in round
brackets and can have a label attached
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns
• As well as matching against the presence • As well as matching against the presence of an annotation, JAPE rules can access annotation features
{Token.kind==“number”}
• Features can be compared with ==, !=, >, <, =~, !~, ==~ and !=~<, =~, !~, ==~ and !=~
• Ranges can be specified({Token})[1,3] or ({Token})[3]
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Left Hand Side PatternsLeft Hand Side PatternsLeft Hand Side PatternsLeft Hand Side Patterns
• Contextual information can be specified in the same way, but has no labelthe same way, but has no label
• Contextual information will be consumed by the rule
({Annotation1})
({Annotation2}):match
({Annotation3})
• There are other constructs that can be used. For details see the user guide.
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
MacrosMacrosMacrosMacros
• Macros look like the LHS of a rule but they never have a labelnever have a label
• They are used in rules by enclosing the macro name in round brackets
• Conventional to name macros in uppercase lettersuppercase letters
• Macros hold across an entire set of grammar phases
University of Sheffield NLP
Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:Learn JAPE By Example:
Right Hand Side AnnotationsRight Hand Side AnnotationsRight Hand Side AnnotationsRight Hand Side Annotations
• LHS and RHS are separated by -- >
• Label matches that on the LHS• Annotation to be created follows the label
(Annotation1): match -->
:match.NewAnnotName = {feature1 = value1, feature2 = value2}
University of Sheffield NLP
Learn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExampleLearn JAPE By ExamplePhase: EMail
Input: Token SpaceToken Options: control = appeltOptions: control = appelt
Macro: WORD_OR_NUMBER(
({Token.kind == word}|{Token.kind == number}))
Rule: emailaddressPriority: 50
((WORD_OR_NUMBER)+
({Token.string == "."}(WORD_OR_NUMBER)+)*({Token.string == "."}(WORD_OR_NUMBER)+)*{Token.string == "@"}
(WORD_OR_NUMBER)+({Token.string == "."}(WORD_OR_NUMBER)+)*
)
:email -->:email.Email = {rule = "emailaddress"}
University of Sheffield NLP
Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:
Multiple PhasesMultiple PhasesMultiple PhasesMultiple Phases
• Grammars usually consist of several phases which are run sequentiallyrun sequentially
• A definition phase (conventionally called main.jape) lists the phases to be used, in order
• Only the definition phase needs to be loaded
• Temporary annotations may be created in early phases and used as input for later phasesand used as input for later phases
• Annotations from earlier phases may need to be combined or modified
17
University of Sheffield NLP
Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:
Loading Grammars into GATELoading Grammars into GATELoading Grammars into GATELoading Grammars into GATE
• Load a JAPE transducer, with parameter • Load a JAPE transducer, with parameter the .jape file you have created
• Add to application and run• Inspect results
18
University of Sheffield NLP
Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:Learning JAPE By Example:
Loading Grammars into GATELoading Grammars into GATELoading Grammars into GATELoading Grammars into GATE
University of Sheffield NLP
Hands On:Hands On:Hands On:Hands On:
Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example
• The best way to learn JAPE is to try • The best way to learn JAPE is to try writing rules yourself
• In the previous session you should have added a new gazetteer to look for words that might signify a change in share price
University of Sheffield NLP
Hands On:Hands On:Hands On:Hands On:
Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example
• Use the Lookup annotations from your gazetteer along with named entities annotated by ANNIEalong with named entities annotated by ANNIE� Organization� Money� Percent� ...
• Annotate the documents to associate a company with a change in share price:with a change in share price:� Shares in Scoot rose 9 per cent on the
announcement...� Whitbread shares closed up 2p at 645p.� ...
Your Turn!Your Turn!Your Turn!Your Turn!Feel Free To Refer To The User Guide
And To Ask For Help
University of Sheffield NLP
Hands On:Hands On:Hands On:Hands On:
Extending the IE ExampleExtending the IE ExampleExtending the IE ExampleExtending the IE Example
Phase: SharesInput: Token Organization Lookup Money PercentInput: Token Organization Lookup Money PercentOptions: control = appelt
Rule:ShareChange(
{Organization}({Token})[0,3]{Lookup.majorType=="change"}({Token})[0,3]({Token})[0,3]({Money}|{Percent})
):change -->:change.ShareChange = {rule = "ShareChange"}