relational learning of pattern-match rules for information extraction mary elaine califf raymond j....
Post on 19-Dec-2015
219 views
TRANSCRIPT
![Page 1: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/1.jpg)
Relational Learning of Relational Learning of Pattern-Match Rules Pattern-Match Rules
for Information for Information ExtractionExtraction
Mary Elaine Califf
Raymond J. Mooney
![Page 2: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/2.jpg)
Motivation
Increasing electronic documents contain a large amount of information
Time-consuming to build IE systems Highly domain-specific components
![Page 3: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/3.jpg)
RAPIER
Uses relational learning to construct unbounded pattern-match rules, given a database of texts and filled templates
Primarily consists of a bottom-up search Employs limited syntactic and semantic
information Learn rules for the complete IE task
![Page 4: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/4.jpg)
Filled template of RAPIER
![Page 5: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/5.jpg)
Relational learning and Inductive Logic Programming (ILP)
Allow induction over structured examples that can include first-order logical representations and unbounded data structures
Work well in text categorization and generation of the past tense of English verbs
![Page 6: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/6.jpg)
Other ILP Systems
GOLEM CHILLIN PROGOL
![Page 7: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/7.jpg)
RAPIER’s rule representation
Indexed by template name and slot name Consists of three parts:
1. A pre-filler pattern
2. Filler pattern (matches the actual slot)
3. Post-filler
![Page 8: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/8.jpg)
Pattern
Pattern item: matches exactly one word Pattern list: has a maximum length N and
matches 0..N words. Must satisfy a set of constraints
1. Specific word, POS, Semantic class
2. Disjunctive lists
![Page 9: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/9.jpg)
An example of rule
Sold to the bank for an undisclosed amountPaid Honeywell an undisclosed price
![Page 10: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/10.jpg)
RAPIER’S Learning Algorithm
Begins with a most specific definition and compresses it by replacing with more general ones
Attempts to compress the rules for each slot
Preferring more specific rules
![Page 11: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/11.jpg)
Implementation
Least general generalization (LGG) Starts with rules containing only
generalizations of the filler patterns Employs top-down beam search for pre
and post fillers Rules are ordered using an information
gain metric and weighted by the size of the rule (preferring smaller rules)
![Page 12: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/12.jpg)
Example
Located in Atlanta, Georgia.Offices in Kansas City, Missouri
![Page 13: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/13.jpg)
Example (cont)
![Page 14: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/14.jpg)
Example (cont)Final best rule:
![Page 15: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/15.jpg)
Experimental Evaluation
A set of 300 computer-related job posting from austin.jobs
A set of 485 seminar announcements from CMU. Three different versions of RAPIER were tested
1.words, POS tags, semantic classes
2. words, POS tags
3. words
![Page 16: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/16.jpg)
Other learning IE systems
Naïve Bayes system, uses words in a fixed-length window to locate slot
SRV, uses top-down, set-covering rule learner and four pre-determined predicates.
WHISK, uses pattern match and restricted form of regular expressions
![Page 17: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/17.jpg)
Performance on job postings
![Page 18: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/18.jpg)
Results for seminar announcement task
![Page 19: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/19.jpg)
Conclusion
Pros1. Have the potential to help automate the development process of IE systems.
2. Work well in locating specific data in newsgroup messages3. Identify potential slot fillers and their surrounding context with limited syntactic and semantic information4. Learn rules from relatively small sets of examples in some specific domains
Cons1.single slot
2.regular expression3. Unknown performances for more complicated situations
![Page 20: Relational Learning of Pattern-Match Rules for Information Extraction Mary Elaine Califf Raymond J. Mooney](https://reader031.vdocuments.net/reader031/viewer/2022032800/56649d405503460f94a19a73/html5/thumbnails/20.jpg)